Free Robots.txt Generator

How to use the Robots.txt Generator

robots.txt is the first file every crawler requests when it visits your site. A misconfigured one can either accidentally block your entire site or invite scrapers in. This generator outputs a tested baseline plus the specific rules you add.

1

Set the default rule

Start with User-agent: * and choose Allow or Disallow as the global default. For most sites, the default is Allow with specific Disallow paths added underneath.

2

Add disallow paths

List paths you don't want crawled — admin panels, search result pages, faceted-navigation URLs, staging directories. Each line is a path prefix; /admin blocks /admin/anything.

3

Reference your sitemap

Add Sitemap: https://yoursite.com/sitemap.xml. This is the canonical way to surface your sitemap to crawlers without relying on Search Console or Bing Webmaster Tools alone.

4

Add bot-specific rules (optional)

Block aggressive scrapers (AhrefsBot, SemrushBot) or AI crawlers (GPTBot, CCBot, ClaudeBot) by adding their User-agent block with disallow rules. Don't block Googlebot, Bingbot, or DuckDuckBot.

Why robots.txt matters for SEO

robots.txt is the lowest-effort, highest-leverage technical SEO file on your site. A correct one keeps search engines focused on your real content. A broken one can deindex the entire domain in 48 hours.

What robots.txt actually does

robots.txt is a crawl directive, not an indexing directive. It tells crawlers which URLs they're allowed to fetch. Crucially, it does not remove pages from the index — a page blocked in robots.txt can still appear in Google's results if other sites link to it (Google just won't have a description because it can't crawl the page). To deindex, use noindex meta tags or HTTP headers.

The five most common robots.txt mistakes

Disallow: / on a production site — blocks the entire domain. Catastrophic.
Wildcard misuse — Disallow: *.pdf doesn't work; the syntax is Disallow: /*.pdf$.
Blocking CSS/JS — Google needs to render pages with their CSS and JS to evaluate them. Don't block /assets/, /css/, /js/.
Trying to hide content — robots.txt is public. Anyone can read it. Don't list secret URLs you want hidden.
Forgetting the sitemap line — without it, Bing and smaller engines may never find your sitemap.

Should you block AI crawlers?

Up to you. Blocking GPTBot, CCBot, ClaudeBot, Google-Extended, and PerplexityBot keeps your content out of LLM training data and AI answer engines. The trade-off: you also lose visibility in those AI search experiences. For content sites, the emerging consensus is to allow LLM-search bots (PerplexityBot, ChatGPT-User) but block training-only bots (GPTBot, CCBot).

Frequently asked questions

What is robots.txt?

robots.txt is a plain-text file at the root of your domain (yoursite.com/robots.txt) that tells web crawlers which parts of your site they're allowed to fetch. It uses a simple syntax of User-agent and Allow/Disallow lines, plus an optional Sitemap reference. Every well-behaved crawler — Googlebot, Bingbot, AhrefsBot — reads it before crawling.

Where do I put the robots.txt file?

At the root of your domain, accessible at https://yoursite.com/robots.txt. Subdirectories don't work — Google only reads /robots.txt at the domain root. Each subdomain needs its own (blog.yoursite.com/robots.txt is separate from yoursite.com/robots.txt).

Does robots.txt prevent pages from being indexed?

No — robots.txt only prevents crawling. A blocked page can still appear in Google's index if other sites link to it; Google just won't have a description for it. To prevent indexing, use a noindex meta tag or an X-Robots-Tag HTTP header. Important: if you block a page in robots.txt, Google can't see the noindex tag, so the page may still index. Allow-then-noindex is the correct sequence.

Should I block AhrefsBot, SemrushBot, and Majestic?

If you don't want competitors seeing your backlink profile or traffic estimates, blocking these is reasonable. The trade-off is that those tools rank you lower in their indexes, which can affect outreach metrics if you're trying to be discovered. Most established sites allow them; new sites focused on stealth growth sometimes block them.

Can robots.txt block AI crawlers like GPTBot and ClaudeBot?

Yes. Add User-agent: GPTBot followed by Disallow: / to block OpenAI's crawler. Same syntax for ClaudeBot, CCBot, Google-Extended, PerplexityBot. These bots respect robots.txt by policy. Keep in mind that blocking them excludes your content from those AI systems' training data and answer engines.

Related SEO tools

Tools that pair naturally with this one.

Robots.txt Generator

Generated robots.txt