๐Ÿค– Crawl Control

Free robots.txt Generator

Generate a valid robots.txt file for your website. Control which pages search engine bots can crawl and index โ€” no coding needed.

๐Ÿค– Search Engine Access

โœ… Allow All Search Engines

Googlebot, Bingbot, and all other major crawlers can access your entire site. Recommended for most websites.

๐Ÿšซ Block All Robots

Prevents all crawlers from indexing your site. Use only for staging/development environments.

๐Ÿง  Block AI Training Bots

Block GPTBot (OpenAI), Claude-Web (Anthropic), CCBot, and other AI scrapers from training on your content.

๐Ÿ“‚ Block Specific Directories

Block /admin/ pages

Hides admin and login pages from search engine indexing. Recommended for all sites.

Block /login/ and /register/

Prevents login, register, and account management pages from appearing in search results.

Block internal search results (?s= ?q= ?search=)

Prevents internal search result pages from being indexed, which can cause duplicate content issues.

Block /cart/ and /checkout/ (eCommerce)

Prevents shopping cart and checkout pages from being indexed. Recommended for ecommerce sites.

Block WordPress system files

Blocks /wp-admin/, /wp-includes/, xmlrpc.php โ€” recommended if your site runs on WordPress.

โž• Block Custom Paths
๐Ÿค–
AI Meta Tag Generator
AI title & description
๐Ÿท
Schema Generator
JSON-LD structured data
โ“
AI FAQ Generator
Generate FAQ schema
๐Ÿ”
SERP Preview
Preview Google snippet
๐Ÿ“Š
Keyword Density
Analyze keyword frequency
โšก
Full SEO Audit
Live Lighthouse audit

What Is robots.txt and Why Does It Matter?

The robots.txt file is a plain text file placed at the root of your website (e.g. yoursite.com/robots.txt) that tells search engine crawlers which pages they are allowed to access and which they should ignore. It follows the Robots Exclusion Standard โ€” a protocol respected by all major crawlers including Googlebot, Bingbot, and Yandex.

A correctly configured robots.txt file does two important things: it protects pages you don't want indexed (admin panels, login pages, staging environments) and it conserves your crawl budget โ€” the number of pages Google will crawl on your site per day. Wasting crawl budget on irrelevant pages means your important content gets crawled less frequently.

After setting up your robots.txt, run a full SEO Audit to confirm your important pages are being crawled and indexed correctly.

How robots.txt Affects Your SEO

๐Ÿ’ฐ
Crawl Budget Optimisation
Large sites have a limited crawl budget. Blocking low-value pages (search results, filters, duplicates) ensures Google spends more time crawling your important money pages.
๐Ÿ›ก๏ธ
Security & Privacy
Blocking /admin/, /login/, and similar paths prevents these URLs from appearing in Google's index โ€” reducing your attack surface and keeping sensitive interfaces private.
๐Ÿ“‹
Duplicate Content Prevention
Blocking URL parameters like ?sort=, ?filter=, and internal search results (?s=) prevents Google from indexing near-duplicate versions of your pages that dilute ranking authority.
๐Ÿค–
Block AI Training Scrapers
Block GPTBot, CCBot, Claude-Web, and other AI training bots from using your content to train large language models โ€” without affecting your Google rankings.

robots.txt Syntax Explained

# This is a comment โ€” ignored by crawlers
User-agent: *   # Applies to ALL crawlers
Allow: /          # Allow crawling of all pages
Disallow: /admin/  # Block the /admin/ directory
User-agent: GPTBot  # Applies only to OpenAI's crawler
Disallow: /         # Block GPTBot from entire site
Sitemap: https://yoursite.com/sitemap.xml
User-agent
Specifies which crawler the rules apply to. Use * for all crawlers, or a specific name like Googlebot for Google only.
Disallow
Blocks the specified path. Disallow: / blocks everything. Disallow: /admin/ only blocks the admin directory.
Allow
Overrides a Disallow rule for a specific path. Useful when you've blocked a directory but want to allow one specific file within it.
Sitemap
Points crawlers to your XML sitemap. This helps Google discover all your pages faster โ€” especially important for large or newly launched sites.

Critical robots.txt Mistakes to Avoid

โŒ
Blocking your entire site
Disallow: / under User-agent: * blocks Google from crawling everything. This is a common mistake on WordPress sites during development that gets forgotten and left live.
โŒ
Blocking CSS and JavaScript files
Google needs to render your pages to evaluate them. Blocking /wp-content/, /assets/, or style files prevents Google from seeing how your page actually looks โ€” hurting rankings.
โŒ
Thinking robots.txt is secure
robots.txt is publicly visible at yoursite.com/robots.txt. Malicious bots ignore it entirely. Never rely on robots.txt to hide sensitive content โ€” use proper authentication instead.
โŒ
Confusing robots.txt with noindex
robots.txt blocks crawling but pages can still be indexed if other sites link to them. Use noindex meta tags for pages you want crawled but not indexed.

โšก Audit Your Full Technical SEO

Check if your robots.txt is blocking important pages, plus 50+ other technical SEO factors โ€” free.

๐Ÿš€ Run Free SEO Audit โ†’

Frequently Asked Questions

What is robots.txt?

robots.txt is a text file at your website's root that instructs search engine crawlers which pages they can access. It follows the Robots Exclusion Standard, respected by Googlebot, Bingbot, Yandex, and all major crawlers.

Does robots.txt affect Google rankings?

Yes โ€” directly. Blocking important pages prevents them from being indexed and ranking. Accidentally blocking your homepage or key landing pages via robots.txt is one of the most damaging SEO mistakes possible.

Is robots.txt the same as noindex?

No. robots.txt blocks crawling โ€” Google never visits the page. A noindex meta tag allows crawling but prevents indexing. For maximum control over sensitive pages, use both: block crawling via robots.txt AND add noindex to the page itself.

How do I check if my robots.txt is working?

Go to Google Search Console โ†’ Settings โ†’ robots.txt. You can test specific URLs to see if they're blocked. Alternatively, visit yoursite.com/robots.txt directly in your browser to see the current file.

Can I block specific bots with robots.txt?

Yes. Each User-agent block targets a specific crawler by name. Common ones to block: GPTBot (OpenAI), CCBot (Common Crawl), Claude-Web (Anthropic), Bytespider (TikTok). Our generator includes AI bot blocking as a one-click option.

Where exactly should I upload my robots.txt file?

The file must be at the root of your domain: yourwebsite.com/robots.txt โ€” not in a subfolder. If your site is in a subdirectory (yoursite.com/blog/), a robots.txt in that folder will not be recognised by Google.