Free Robots.txt Validator — Check Your Robots.txt

Paste your robots.txt content

Domain (robots.txt is appended automatically)

Validation Results

Parsed Rules

User-Agent	Directive	Value

Test a URL

Enter a path and select a user-agent to check if it would be blocked or allowed.

URL Path

User-Agent

How to use the Robots.txt Validator

robots.txt syntax has subtleties — wildcards, end-anchors, conflicting allow/disallow rules. The validator simulates a crawler request and tells you whether a specific URL would be allowed or blocked.

Paste your robots.txt content

Either paste the full robots.txt text or fetch from a live URL. The validator parses it the way Google's crawler does.

Enter test URLs

Add the URLs you want to verify — admin paths you intend to block, public paths you intend to allow, and any edge cases (URLs with query strings, trailing slashes).

Pick the user-agent

Test as Googlebot, Bingbot, AhrefsBot, GPTBot, etc. User-agent-specific rules override the wildcard * block, so per-bot testing reveals conflicts.

Review the verdict per URL

Each URL shows ALLOWED or BLOCKED, plus the specific rule that decided it. If the result differs from your intent, fix the rule before deploying.

Why validating robots.txt before deploy is non-negotiable

A misconfigured robots.txt can deindex your entire site in 48 hours. The fix is fast (one file edit) but the recovery — getting Google to re-crawl and re-index — takes weeks. Always validate before deploy.

Common robots.txt syntax mistakes

Disallow: / blocks the entire site (catastrophic on production).
Trailing space on rules — Disallow: /admin doesn't match /admin.
Path vs URL confusion — robots.txt rules match paths, not full URLs.
Wildcard misuse — *.pdf doesn't work; correct syntax is /*.pdf$.
Allow + Disallow conflicts — when both match, the longer rule wins (not the first).
Forgotten User-agent line — rules without a User-agent above them are ignored.

How Google interprets conflicts

When a URL matches both an Allow and a Disallow rule, Google picks the rule with the longer pattern, not the one that appears first. Disallow: /products/ + Allow: /products/featured/ means /products/featured/ is allowed (because /products/featured/ is longer than /products/). Test this — humans frequently get it wrong.

Frequently asked questions

What does a robots.txt validator do?

It simulates a web crawler reading your robots.txt and tells you which URLs would be allowed or blocked under the rules. This catches misconfigurations before they affect Google's crawl — wildcard mistakes, path mismatches, allow/disallow conflicts.

Why is my page blocked when I didn't intend to?

The most common causes: a parent directory disallow (Disallow: /api/ blocks /api/public/), a wildcard rule matching unintentionally (Disallow: /*.json$ may block files you wanted indexed), or a User-agent-specific rule overriding the wildcard. The validator points to the exact rule that decided each verdict.

How does Google handle conflicts in robots.txt?

When a URL matches both an Allow and a Disallow rule, Google picks the rule with the longer pattern. So Disallow: /admin/ with Allow: /admin/public/ means /admin/public/ is allowed (longer pattern). Order in the file doesn't matter.

Can I block specific user agents?

Yes. Add a separate User-agent: block per bot you want different rules for. User-agent: GPTBot followed by Disallow: / blocks ChatGPT's crawler. The wildcard User-agent: * applies to bots not explicitly listed.

Does robots.txt prevent indexation?

No — only crawling. A blocked page can still appear in search results if other sites link to it; Google just won't have a description. To prevent indexation, use noindex meta tags or the X-Robots-Tag HTTP header. Important: blocked pages can't be seen by Google to read the noindex tag.

Robots.txt Best Practices

The robots.txt file tells search engine crawlers which pages or sections of your site they can or cannot request. It lives at the root of your domain (e.g. https://example.com/robots.txt) and follows a simple text-based protocol.

Valid Directives

User-agent — Specifies which crawler the following rules apply to. Use * for all bots.
Disallow — Tells the bot not to crawl the specified path. An empty value means "allow all."
Allow — Overrides a Disallow for a more specific path (supported by Google and Bing).
Sitemap — Points to your XML sitemap. Must be an absolute URL. Can appear anywhere in the file.
Crawl-delay — Requests a delay (in seconds) between crawler requests. Not supported by Google, but used by Bing and others.
Host — Specifies the preferred domain version. Used mainly by Yandex.

Common Mistakes to Avoid

Using Disallow: / blocks all crawling — your pages will not be indexed.
Forgetting the User-agent directive before Disallow or Allow rules.
Using relative URLs in Sitemap directives — they must be absolute (start with http).
Blocking CSS/JS files that search engines need to render your pages.
Relying on robots.txt for security — it is publicly readable and not an access control mechanism.

Wildcard Patterns

Google and Bing support * (match any sequence) and $ (end of URL) in path patterns. For example, Disallow: /*.pdf$ blocks all PDF files.

Robots.txt Validator

Validation Results

Parsed Rules

Test a URL

How to use the Robots.txt Validator

Paste your robots.txt content

Enter test URLs

Pick the user-agent

Review the verdict per URL

Why validating robots.txt before deploy is non-negotiable

Common robots.txt syntax mistakes

How Google interprets conflicts

Frequently asked questions

What does a robots.txt validator do?

Why is my page blocked when I didn't intend to?

How does Google handle conflicts in robots.txt?

Can I block specific user agents?

Does robots.txt prevent indexation?

Robots.txt Best Practices

Valid Directives

Common Mistakes to Avoid

Wildcard Patterns

Related SEO tools

Robots.txt Generator

Meta Robots Tag Generator

XML Sitemap Generator

Broken Link Checker