Robots.txt Generator

Create robots.txt files with User-agent, Allow, Disallow, and Sitemap directives for search engine crawlers.

Back to all tools on ToolForge

More in SEO Tools

User-agent:

Disallow paths:

Allow paths:

Sitemap URL:

robots.txt

About Robots.txt Generator

This robots.txt generator creates properly formatted robots.txt files using the Robots Exclusion Protocol. It supports User-agent, Allow, Disallow, and Sitemap directives, helping you control how search engine crawlers access and index your website content.

It is useful for blocking admin panels from search results, preventing duplicate content indexing, directing crawlers to your sitemap, managing crawl budget on large sites, protecting sensitive directories, and ensuring search engines focus on your most important content.

Robots.txt File Structure

A robots.txt file follows a simple line-based format:

Basic Structure:
  User-agent: [crawler-name]
  Disallow: [path]
  Allow: [path]

  Sitemap: [sitemap-url]

Example robots.txt:
  User-agent: *
  Disallow: /admin/
  Disallow: /private/
  Allow: /public/

  Sitemap: https://example.com/sitemap.xml

Key Points:
  - Each directive on its own line
  - Format: Directive: value
  - Blank lines separate rule groups
  - Lines starting with # are comments
  - File must be in website root (/robots.txt)
  - Case-sensitive paths (/Admin/ ≠ /admin/)
  - No quotes around values

Common Crawler User-Agent Names

Search Engine User-Agent String Notes
Google (General) Googlebot Main Google crawler
Google Images Googlebot-Image Indexes images for Google Images
Google News Googlebot-News Indexes news content
Google Video Googlebot-Video Indexes video content
Bing bingbot Microsoft Bing crawler
Yahoo Slurp Yahoo Search (powered by Bing)
DuckDuckGo DuckDuckBot DuckDuckGo search engine
Baidu Baiduspider Chinese search engine
Yandex YandexBot Russian search engine
Facebook facebookexternalhit Link preview generator
Twitter Twitterbot Twitter card preview
All Crawlers * Wildcard for all crawlers

Common Robots.txt Examples

Example 1: Allow All Crawlers (Default)
  User-agent: *
  Disallow:

  Sitemap: https://example.com/sitemap.xml

Example 2: Block All Crawlers
  User-agent: *
  Disallow: /

Example 3: Block Specific Folders
  User-agent: *
  Disallow: /admin/
  Disallow: /private/
  Disallow: /temp/
  Disallow: /cgi-bin/

Example 4: Block Specific Crawler
  User-agent: Googlebot
  Disallow: /admin/

  User-agent: *
  Disallow: /private/

Example 5: Allow Subfolder in Blocked Directory
  User-agent: *
  Disallow: /admin/
  Allow: /admin/public/

Example 6: Block Specific File Types
  User-agent: *
  Disallow: /*.pdf$
  Disallow: /*.doc$
  Disallow: /*.xls$

Example 7: Multiple Sitemaps
  User-agent: *
  Disallow:

  Sitemap: https://example.com/sitemap.xml
  Sitemap: https://example.com/sitemap-news.xml
  Sitemap: https://example.com/sitemap-images.xml

Path Pattern Matching

Path Matching Rules:

1. Exact Match:
   Disallow: /admin
   Blocks: /admin (exact URL path)
   Does NOT block: /admin/ or /admin/page

2. Directory Match (with trailing slash):
   Disallow: /admin/
   Blocks: /admin/, /admin/page, /admin/anything
   Does NOT block: /admin (no trailing slash)

3. Wildcard Match (*):
   Disallow: /*.pdf
   Blocks: All URLs ending in .pdf
   Disallow: /admin/*
   Blocks: Everything under /admin/

4. End-of-URL Anchor ($):
   Disallow: /*.pdf$
   Blocks: URLs that END with .pdf
   Disallow: /page.html$
   Blocks: Only /page.html at URL end

5. Pattern Examples:
   /admin          - Matches exact path
   /admin/         - Matches directory and contents
   /*.pdf$         - Matches PDF files at URL end
   /admin/*.html   - Matches HTML files in /admin/
   /*?             - Matches single character after /
   /page*          - Matches /page, /page1, /page2, etc.

Important: Google supports * and $ wildcards.
Other crawlers may not support wildcards.

Robots.txt Best Practices

Practice Recommendation
File location Must be in root directory (https://domain.com/robots.txt)
File size Google limit: 500 KB maximum
Character encoding Use UTF-8 encoding for international characters
Case sensitivity Paths are case-sensitive (/Admin/ ≠ /admin/)
Comments Use # for comments to document your rules
Testing Always test in Google Search Console before deploying
Sitemap location Include Sitemap directive to help crawlers find content
Crawl-delay Use sparingly; Google ignores this directive

Common Mistakes to Avoid

Testing and Validation

Testing Tools:

1. Google Search Console - Robots.txt Tester
   - Access via: Search Console → Settings → robots.txt
   - Tests against Googlebot behavior
   - Shows which URLs are blocked
   - Validates syntax and warns of issues

2. Bing Webmaster Tools
   - Similar testing functionality for Bing
   - Shows how bingbot interprets rules

3. Manual Testing:
   - Visit: https://yourdomain.com/robots.txt
   - Verify file is accessible (HTTP 200)
   - Check for typos and syntax errors
   - Ensure paths match your URL structure

4. URL Inspection:
   - Use Search Console URL Inspection tool
   - Check if specific URLs are blocked
   - View crawling status

Common Error Messages:
  - "Unable to fetch" - File not accessible
  - "Syntax error" - Invalid directive format
  - "Warning: Blocked by robots.txt" - URL blocked from crawling

SEO Impact of Robots.txt

Frequently Asked Questions

What is a robots.txt file and how does it work?
A robots.txt file is a plain text file in your website's root directory that tells search engine crawlers which URLs they can or cannot access. It uses the Robots Exclusion Protocol with directives like User-agent, Allow, Disallow, and Sitemap. Crawlers read this file before accessing your site and respect the rules specified for their user-agent.
What are the common robots.txt directives?
Common directives include: User-agent (specifies which crawler the rules apply to), Disallow (blocks access to specific paths), Allow (permits access to specific paths within a blocked directory), Sitemap (provides XML sitemap location), Crawl-delay (requests delay between requests), and Host (specifies preferred domain). Each directive appears on its own line.
How do I block all crawlers from my site?
To block all crawlers, use: User-agent: * followed by Disallow: /. This tells all crawlers they cannot access any part of your site. However, note that robots.txt is not a security mechanism - determined crawlers can ignore it, and blocked pages may still appear in search results if linked from other sites.
How do I allow all crawlers to access my site?
To allow all crawlers full access, use: User-agent: * followed by Disallow: (empty). An empty Disallow directive means nothing is blocked. Alternatively, you can omit the Disallow line entirely. You can also add Sitemap: https://example.com/sitemap.xml to help crawlers find your content.
Does robots.txt prevent pages from appearing in Google?
No, robots.txt only prevents crawling, not indexing. If a blocked page is linked from other websites, Google may still index its URL without content. To prevent indexing, use noindex meta tags or password protection. For sensitive data, use server-side authentication. Robots.txt is a request, not a security measure.
What is the difference between Allow and Disallow?
Disallow blocks crawlers from accessing specified paths. Allow permits access to specific paths that would otherwise be blocked by a broader Disallow rule. For example, Disallow: /admin/ blocks the entire admin folder, but Allow: /admin/public/ permits access to the public subfolder within admin.