Robots.txt Tester
Test whether a URL path is allowed or blocked for a specific user-agent based on robots.txt rules.
Back to all tools on ToolForge
User-agent:
Path:
About Robots.txt Tester
This robots.txt tester simulates how search engine crawlers interpret robots.txt rules. It parses your robots.txt content, finds rules for a specific user-agent, and determines whether a given URL path would be allowed or blocked based on the most specific matching rule.
It is useful for troubleshooting why pages aren't being indexed, verifying new robots.txt rules before deployment, testing edge cases with Allow/Disallow conflicts, debugging crawler access issues, and ensuring important pages aren't accidentally blocked from search engines.
How Path Matching Works
The tester uses the same matching logic as search engine crawlers:
Matching Algorithm: 1. Parse robots.txt into rule groups by user-agent 2. Find rules for the specified user-agent 3. If no specific rules, fall back to "*" (wildcard) 4. Check all Allow and Disallow rules against the path 5. Find the LONGEST matching path (most specific rule) 6. Apply that rule: Allow = accessible, Disallow = blocked 7. If no rules match, URL is ALLOWED by default Example: User-agent: Googlebot Disallow: /admin/ Disallow: /private/ Allow: /admin/public/ Test Path: /admin/public/page.html - Matches: Disallow: /admin/ (length 7) - Matches: Allow: /admin/public/ (length 15) - Winner: Allow (longer path) - Result: ALLOWED Test Path: /admin/settings - Matches: Disallow: /admin/ (length 7) - No Allow rule matches - Result: BLOCKED Test Path: /public/page - No matching rules - Result: ALLOWED (default)
Test Result Examples
Example 1: Simple Block robots.txt: User-agent: * Disallow: /private/ Test: User-agent = Googlebot, Path = /private/secret Result: BLOCKED Matched rule: DISALLOW: /private/ Example 2: Allow Override robots.txt: User-agent: * Disallow: /admin/ Allow: /admin/public/ Test: User-agent = Googlebot, Path = /admin/public/docs Result: ALLOWED Matched rule: ALLOW: /admin/public/ Example 3: Specific User-Agent robots.txt: User-agent: Googlebot Disallow: /no-google/ User-agent: Bingbot Disallow: /no-bing/ Test: User-agent = Googlebot, Path = /no-google/page Result: BLOCKED Matched rule: DISALLOW: /no-google/ Test: User-agent = Bingbot, Path = /no-google/page Result: ALLOWED (no matching rule for Bingbot) Example 4: No Match = Allowed robots.txt: User-agent: * Disallow: /admin/ Test: User-agent = Googlebot, Path = /public/page Result: ALLOWED Reason: No matching rule found
Common User-Agent Strings
| Search Engine | User-Agent Value | Test With |
|---|---|---|
| Google (Main) | Googlebot | Googlebot |
| Google Images | Googlebot-Image | Googlebot-Image |
| Google News | Googlebot-News | Googlebot-News |
| Bing | bingbot | bingbot |
| Yahoo | Slurp | Slurp |
| DuckDuckGo | DuckDuckBot | DuckDuckBot |
| Baidu | Baiduspider | Baiduspider |
| Yandex | YandexBot | YandexBot |
| All Crawlers | * | * or any agent |
Path Matching Scenarios
Scenario 1: Exact Directory Match
Rule: Disallow: /admin/
/admin/ → BLOCKED
/admin/page → BLOCKED
/admin → ALLOWED (no trailing slash)
/administrator → ALLOWED (different path)
Scenario 2: File Extension Block
Rule: Disallow: /*.pdf$
/file.pdf → BLOCKED
/file.pdf?id=1 → BLOCKED (if crawler supports $)
/file.html → ALLOWED
/pdf/file.html → ALLOWED
Scenario 3: Nested Allow/Disallow
Rules:
Disallow: /files/
Allow: /files/public/
/files/ → BLOCKED
/files/private/ → BLOCKED
/files/public/ → ALLOWED
/files/public/doc → ALLOWED
Scenario 4: Multiple Rules Same Path
Rules:
Allow: /page
Disallow: /page
/page → ALLOWED (Allow listed last wins for same length)
Scenario 5: Wildcard Patterns (Google only)
Rule: Disallow: /*?
/page → ALLOWED (no query string)
/page?id=1 → BLOCKED (has query parameter)
/page?a=1&b=2 → BLOCKED
Note: Wildcards (* and $) are Google-specific.
Other crawlers may ignore them.
Troubleshooting Guide
| Problem | Possible Cause | Solution |
|---|---|---|
| Page not indexed | Accidentally blocked by robots.txt | Test URL path; remove or adjust Disallow rule |
| Wrong crawler blocked | Using * instead of specific user-agent | Create crawler-specific rule groups |
| Allow not working | Allow path shorter than Disallow | Make Allow path more specific (longer) |
| Subdirectories blocked | Missing trailing slash on Disallow | Add trailing slash: /admin/ not /admin |
| Query strings blocked | Wildcard rule affecting URLs with ? | Review /* patterns; test with query strings |
Robots.txt Syntax Reference
Valid Directives: User-agent: [crawler-name] - Specifies which crawler Disallow: [path] - Blocks access to path Allow: [path] - Permits access to path Sitemap: [url] - Sitemap location Crawl-delay: [seconds] - Request delay (ignored by Google) Host: [domain] - Preferred domain (Yandex only) Comments: # This is a comment # Comments are ignored by crawlers # Use comments to document your rules Syntax Rules: - One directive per line - Format: Directive: value - No quotes around values - Case-sensitive paths - Paths must start with / - Empty Disallow means "allow all" - Lines are case-insensitive for directives Invalid Syntax (will be ignored): Disallow:admin # Missing leading / disallow: /admin # Lowercase directive (OK in some parsers) Disallow: admin # Relative path (invalid) "Disallow: /admin" # Quotes included (invalid)
Testing Best Practices
- Test critical URLs: Always test your homepage, important landing pages, and product pages to ensure they're not blocked.
- Test multiple crawlers: If you have crawler-specific rules, test with each relevant user-agent (Googlebot, bingbot, etc.).
- Test edge cases: Check URLs at directory boundaries, with query strings, and with file extensions.
- Verify after changes: Re-test all critical URLs after modifying your robots.txt file.
- Use Google Search Console: For production sites, verify your robots.txt in Search Console's robots.txt Tester tool.
- Document your rules: Add comments explaining why certain paths are blocked for future reference.
Limitations of This Tester
- Wildcard support: This tester uses simple prefix matching. Google's advanced wildcards (* and $) are not fully simulated.
- Crawl-delay: This directive is not tested; Google ignores it anyway.
- Multiple rule groups: Complex multi-group robots.txt files may have edge cases not covered.
- Real-time fetching: This tester doesn't fetch your actual robots.txt; you must paste the content.
- Crawler behavior: Different crawlers may interpret rules slightly differently. Always verify with official tools.
Frequently Asked Questions
- How does the robots.txt path matching algorithm work?
- Path matching finds the most specific rule that applies to a URL. Rules are matched by checking if the URL path starts with the Disallow or Allow path value. When multiple rules match, the longest (most specific) path wins. For example, if Disallow: /admin/ and Allow: /admin/public/ both match, the longer Allow rule takes precedence for /admin/public/page.
- What happens if no rule matches my URL?
- If no Disallow or Allow rule matches the tested URL path, the URL is considered allowed by default. Search crawlers can access and index URLs that aren't explicitly blocked. This is why an empty robots.txt file (or one with only 'User-agent: *' and no Disallow directives) allows full site access.
- How do I test rules for a specific crawler?
- Enter the crawler's user-agent name (e.g., Googlebot, bingbot, Slurp) in the User-agent field. The tester first looks for rules specific to that user-agent. If no specific rules exist, it falls back to rules for '*' (wildcard), which applies to all crawlers. This matches how real crawlers interpret robots.txt.
- Do wildcards (*) work in robots.txt testing?
- Google supports * (matches any sequence) and $ (end-of-URL anchor) wildcards in robots.txt. However, not all crawlers support wildcards. This tester uses simple prefix matching for compatibility. For wildcard testing, verify with Google Search Console's robots.txt tester for Googlebot-specific behavior.
- What is the order of rule evaluation?
- Crawlers evaluate rules in order: 1) Find the user-agent group (specific agent or wildcard), 2) Collect all Allow and Disallow rules for that group, 3) Find rules that match the URL path, 4) Select the longest matching path, 5) Apply that rule (Allow or Disallow). If no rules match, access is allowed.
- Why is my URL blocked even though I have an Allow rule?
- The Allow rule might be less specific than a Disallow rule. For example, 'Allow: /page' and 'Disallow: /page/' - the URL '/page/sub' matches both, but '/page/' is longer, so Disallow wins. Ensure Allow paths are more specific than Disallow paths, or reorder rules so Allow appears after Disallow.