Validation Results
Parsed Rules
| User-Agent | Directive | Value |
|---|
Test a URL
Enter a path and select a user-agent to check if it would be blocked or allowed.
How to use the Robots.txt Validator
robots.txt syntax has subtleties — wildcards, end-anchors, conflicting allow/disallow rules. The validator simulates a crawler request and tells you whether a specific URL would be allowed or blocked.
Paste your robots.txt content
Either paste the full robots.txt text or fetch from a live URL. The validator parses it the way Google's crawler does.
Enter test URLs
Add the URLs you want to verify — admin paths you intend to block, public paths you intend to allow, and any edge cases (URLs with query strings, trailing slashes).
Pick the user-agent
Test as Googlebot, Bingbot, AhrefsBot, GPTBot, etc. User-agent-specific rules override the wildcard * block, so per-bot testing reveals conflicts.
Review the verdict per URL
Each URL shows ALLOWED or BLOCKED, plus the specific rule that decided it. If the result differs from your intent, fix the rule before deploying.
Why validating robots.txt before deploy is non-negotiable
A misconfigured robots.txt can deindex your entire site in 48 hours. The fix is fast (one file edit) but the recovery — getting Google to re-crawl and re-index — takes weeks. Always validate before deploy.
Common robots.txt syntax mistakes
- Disallow: / blocks the entire site (catastrophic on production).
- Trailing space on rules —
Disallow: /admindoesn't match/admin. - Path vs URL confusion — robots.txt rules match paths, not full URLs.
- Wildcard misuse —
*.pdfdoesn't work; correct syntax is/*.pdf$. - Allow + Disallow conflicts — when both match, the longer rule wins (not the first).
- Forgotten User-agent line — rules without a User-agent above them are ignored.
How Google interprets conflicts
When a URL matches both an Allow and a Disallow rule, Google picks the rule with the longer pattern, not the one that appears first. Disallow: /products/ + Allow: /products/featured/ means /products/featured/ is allowed (because /products/featured/ is longer than /products/). Test this — humans frequently get it wrong.
Frequently asked questions
What does a robots.txt validator do?
It simulates a web crawler reading your robots.txt and tells you which URLs would be allowed or blocked under the rules. This catches misconfigurations before they affect Google's crawl — wildcard mistakes, path mismatches, allow/disallow conflicts.
Why is my page blocked when I didn't intend to?
The most common causes: a parent directory disallow (Disallow: /api/ blocks /api/public/), a wildcard rule matching unintentionally (Disallow: /*.json$ may block files you wanted indexed), or a User-agent-specific rule overriding the wildcard. The validator points to the exact rule that decided each verdict.
How does Google handle conflicts in robots.txt?
When a URL matches both an Allow and a Disallow rule, Google picks the rule with the longer pattern. So Disallow: /admin/ with Allow: /admin/public/ means /admin/public/ is allowed (longer pattern). Order in the file doesn't matter.
Can I block specific user agents?
Yes. Add a separate User-agent: block per bot you want different rules for. User-agent: GPTBot followed by Disallow: / blocks ChatGPT's crawler. The wildcard User-agent: * applies to bots not explicitly listed.
Does robots.txt prevent indexation?
No — only crawling. A blocked page can still appear in search results if other sites link to it; Google just won't have a description. To prevent indexation, use noindex meta tags or the X-Robots-Tag HTTP header. Important: blocked pages can't be seen by Google to read the noindex tag.
Robots.txt Best Practices
The robots.txt file tells search engine crawlers which pages or sections of your site they can or cannot request. It lives at the root of your domain (e.g. https://example.com/robots.txt) and follows a simple text-based protocol.
Valid Directives
User-agent— Specifies which crawler the following rules apply to. Use*for all bots.Disallow— Tells the bot not to crawl the specified path. An empty value means "allow all."Allow— Overrides aDisallowfor a more specific path (supported by Google and Bing).Sitemap— Points to your XML sitemap. Must be an absolute URL. Can appear anywhere in the file.Crawl-delay— Requests a delay (in seconds) between crawler requests. Not supported by Google, but used by Bing and others.Host— Specifies the preferred domain version. Used mainly by Yandex.
Common Mistakes to Avoid
- Using
Disallow: /blocks all crawling — your pages will not be indexed. - Forgetting the
User-agentdirective beforeDisalloworAllowrules. - Using relative URLs in
Sitemapdirectives — they must be absolute (start withhttp). - Blocking CSS/JS files that search engines need to render your pages.
- Relying on robots.txt for security — it is publicly readable and not an access control mechanism.
Wildcard Patterns
Google and Bing support * (match any sequence) and $ (end of URL) in path patterns. For example, Disallow: /*.pdf$ blocks all PDF files.