Robots.txt Tester

Test whether a URL path is allowed or blocked for a specific user-agent based on robots.txt rules.

Back to all tools on ToolForge

More in SEO Tools

User-agent:

Path:

About Robots.txt Tester

This robots.txt tester simulates how search engine crawlers interpret robots.txt rules. It parses your robots.txt content, finds rules for a specific user-agent, and determines whether a given URL path would be allowed or blocked based on the most specific matching rule.

It is useful for troubleshooting why pages aren't being indexed, verifying new robots.txt rules before deployment, testing edge cases with Allow/Disallow conflicts, debugging crawler access issues, and ensuring important pages aren't accidentally blocked from search engines.

How Path Matching Works

The tester uses the same matching logic as search engine crawlers:

Matching Algorithm:

1. Parse robots.txt into rule groups by user-agent
2. Find rules for the specified user-agent
3. If no specific rules, fall back to "*" (wildcard)
4. Check all Allow and Disallow rules against the path
5. Find the LONGEST matching path (most specific rule)
6. Apply that rule: Allow = accessible, Disallow = blocked
7. If no rules match, URL is ALLOWED by default

Example:
  User-agent: Googlebot
  Disallow: /admin/
  Disallow: /private/
  Allow: /admin/public/

Test Path: /admin/public/page.html
  - Matches: Disallow: /admin/ (length 7)
  - Matches: Allow: /admin/public/ (length 15)
  - Winner: Allow (longer path)
  - Result: ALLOWED

Test Path: /admin/settings
  - Matches: Disallow: /admin/ (length 7)
  - No Allow rule matches
  - Result: BLOCKED

Test Path: /public/page
  - No matching rules
  - Result: ALLOWED (default)

Test Result Examples

Example 1: Simple Block
robots.txt:
  User-agent: *
  Disallow: /private/

Test: User-agent = Googlebot, Path = /private/secret
Result: BLOCKED
Matched rule: DISALLOW: /private/

Example 2: Allow Override
robots.txt:
  User-agent: *
  Disallow: /admin/
  Allow: /admin/public/

Test: User-agent = Googlebot, Path = /admin/public/docs
Result: ALLOWED
Matched rule: ALLOW: /admin/public/

Example 3: Specific User-Agent
robots.txt:
  User-agent: Googlebot
  Disallow: /no-google/

  User-agent: Bingbot
  Disallow: /no-bing/

Test: User-agent = Googlebot, Path = /no-google/page
Result: BLOCKED
Matched rule: DISALLOW: /no-google/

Test: User-agent = Bingbot, Path = /no-google/page
Result: ALLOWED (no matching rule for Bingbot)

Example 4: No Match = Allowed
robots.txt:
  User-agent: *
  Disallow: /admin/

Test: User-agent = Googlebot, Path = /public/page
Result: ALLOWED
Reason: No matching rule found

Common User-Agent Strings

Search Engine User-Agent Value Test With
Google (Main) Googlebot Googlebot
Google Images Googlebot-Image Googlebot-Image
Google News Googlebot-News Googlebot-News
Bing bingbot bingbot
Yahoo Slurp Slurp
DuckDuckGo DuckDuckBot DuckDuckBot
Baidu Baiduspider Baiduspider
Yandex YandexBot YandexBot
All Crawlers * * or any agent

Path Matching Scenarios

Scenario 1: Exact Directory Match
  Rule: Disallow: /admin/
  /admin/        → BLOCKED
  /admin/page    → BLOCKED
  /admin         → ALLOWED (no trailing slash)
  /administrator → ALLOWED (different path)

Scenario 2: File Extension Block
  Rule: Disallow: /*.pdf$
  /file.pdf      → BLOCKED
  /file.pdf?id=1 → BLOCKED (if crawler supports $)
  /file.html     → ALLOWED
  /pdf/file.html → ALLOWED

Scenario 3: Nested Allow/Disallow
  Rules:
    Disallow: /files/
    Allow: /files/public/

  /files/            → BLOCKED
  /files/private/    → BLOCKED
  /files/public/     → ALLOWED
  /files/public/doc  → ALLOWED

Scenario 4: Multiple Rules Same Path
  Rules:
    Allow: /page
    Disallow: /page

  /page → ALLOWED (Allow listed last wins for same length)

Scenario 5: Wildcard Patterns (Google only)
  Rule: Disallow: /*?
  /page          → ALLOWED (no query string)
  /page?id=1     → BLOCKED (has query parameter)
  /page?a=1&b=2  → BLOCKED

Note: Wildcards (* and $) are Google-specific.
Other crawlers may ignore them.

Troubleshooting Guide

Problem Possible Cause Solution
Page not indexed Accidentally blocked by robots.txt Test URL path; remove or adjust Disallow rule
Wrong crawler blocked Using * instead of specific user-agent Create crawler-specific rule groups
Allow not working Allow path shorter than Disallow Make Allow path more specific (longer)
Subdirectories blocked Missing trailing slash on Disallow Add trailing slash: /admin/ not /admin
Query strings blocked Wildcard rule affecting URLs with ? Review /* patterns; test with query strings

Robots.txt Syntax Reference

Valid Directives:
  User-agent: [crawler-name]  - Specifies which crawler
  Disallow: [path]            - Blocks access to path
  Allow: [path]               - Permits access to path
  Sitemap: [url]              - Sitemap location
  Crawl-delay: [seconds]      - Request delay (ignored by Google)
  Host: [domain]              - Preferred domain (Yandex only)

Comments:
  # This is a comment
  # Comments are ignored by crawlers
  # Use comments to document your rules

Syntax Rules:
  - One directive per line
  - Format: Directive: value
  - No quotes around values
  - Case-sensitive paths
  - Paths must start with /
  - Empty Disallow means "allow all"
  - Lines are case-insensitive for directives

Invalid Syntax (will be ignored):
  Disallow:admin      # Missing leading /
  disallow: /admin    # Lowercase directive (OK in some parsers)
  Disallow: admin     # Relative path (invalid)
  "Disallow: /admin"  # Quotes included (invalid)

Testing Best Practices

Limitations of This Tester

Frequently Asked Questions

How does the robots.txt path matching algorithm work?
Path matching finds the most specific rule that applies to a URL. Rules are matched by checking if the URL path starts with the Disallow or Allow path value. When multiple rules match, the longest (most specific) path wins. For example, if Disallow: /admin/ and Allow: /admin/public/ both match, the longer Allow rule takes precedence for /admin/public/page.
What happens if no rule matches my URL?
If no Disallow or Allow rule matches the tested URL path, the URL is considered allowed by default. Search crawlers can access and index URLs that aren't explicitly blocked. This is why an empty robots.txt file (or one with only 'User-agent: *' and no Disallow directives) allows full site access.
How do I test rules for a specific crawler?
Enter the crawler's user-agent name (e.g., Googlebot, bingbot, Slurp) in the User-agent field. The tester first looks for rules specific to that user-agent. If no specific rules exist, it falls back to rules for '*' (wildcard), which applies to all crawlers. This matches how real crawlers interpret robots.txt.
Do wildcards (*) work in robots.txt testing?
Google supports * (matches any sequence) and $ (end-of-URL anchor) wildcards in robots.txt. However, not all crawlers support wildcards. This tester uses simple prefix matching for compatibility. For wildcard testing, verify with Google Search Console's robots.txt tester for Googlebot-specific behavior.
What is the order of rule evaluation?
Crawlers evaluate rules in order: 1) Find the user-agent group (specific agent or wildcard), 2) Collect all Allow and Disallow rules for that group, 3) Find rules that match the URL path, 4) Select the longest matching path, 5) Apply that rule (Allow or Disallow). If no rules match, access is allowed.
Why is my URL blocked even though I have an Allow rule?
The Allow rule might be less specific than a Disallow rule. For example, 'Allow: /page' and 'Disallow: /page/' - the URL '/page/sub' matches both, but '/page/' is longer, so Disallow wins. Ensure Allow paths are more specific than Disallow paths, or reorder rules so Allow appears after Disallow.