Sitemap Validator
Validate sitemap XML structure, count URLs, detect duplicates, and check sitemap compliance.
Back to all tools on ToolForge
Validation Report
About Sitemap Validator
This sitemap validator parses XML sitemap files and reports on structure, URL count, duplicate detection, and format compliance. It supports both standard sitemaps (urlset) and sitemap indexes (sitemapindex), helping SEO specialists and developers verify sitemap correctness before submission to search engines.
The validator uses browser-based XML parsing (DOMParser) to analyze sitemap structure without sending data to any server. All processing happens locally in your browser.
Sitemap XML Structure Overview
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://example.com/page1</loc>
<lastmod>2026-01-15</lastmod>
<changefreq>weekly</changefreq>
<priority>0.8</priority>
</url>
<url>
<loc>https://example.com/page2</loc>
<lastmod>2026-01-10</lastmod>
</url>
</urlset>
Elements:
<loc> - Required. Absolute URL (including protocol)
<lastmod> - Optional. Last modification date (ISO 8601)
<changefreq> - Optional. always/hourly/daily/weekly/monthly/yearly/never
<priority> - Optional. 0.0 to 1.0 (default 0.5)
Sitemap Index Format
For sites with more than 50,000 URLs or files larger than 50MB (uncompressed), use a sitemap index:
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>https://example.com/sitemap1.xml</loc>
<lastmod>2026-01-15</lastmod>
</sitemap>
<sitemap>
<loc>https://example.com/sitemap2.xml</loc>
<lastmod>2026-01-14</lastmod>
</sitemap>
</sitemapindex>
JavaScript Validation Algorithm
// Sitemap Validator Implementation
function validateSitemap(xmlString) {
// Parse XML using DOMParser
const parser = new DOMParser();
const doc = parser.parseFromString(xmlString, "text/xml");
// Check for parse errors
const parseError = doc.querySelector("parsererror");
if (parseError) {
return { valid: false, error: "Invalid XML: " + parseError.textContent };
}
const root = doc.documentElement;
const report = [];
// Check root element
report.push("Root element: " + root.nodeName);
if (root.nodeName === "urlset") {
// Standard sitemap
const urls = Array.from(doc.getElementsByTagName("url"));
const locs = urls.map(url => {
const loc = url.getElementsByTagName("loc")[0];
return loc ? loc.textContent.trim() : "";
}).filter(Boolean);
// Detect duplicates
const duplicates = locs.filter(
(loc, index) => locs.indexOf(loc) !== index
);
report.push("URL count: " + locs.length);
report.push("Duplicate URLs: " + duplicates.length);
report.push("Sample URLs: " + locs.slice(0, 5).join(", "));
} else if (root.nodeName === "sitemapindex") {
// Sitemap index
const sitemaps = Array.from(doc.getElementsByTagName("sitemap"));
report.push("Nested sitemap count: " + sitemaps.length);
}
return { valid: true, report: report };
}
Sitemap Requirements and Limits
| Requirement | Specification | Notes |
|---|---|---|
| Max URLs per sitemap | 50,000 URLs | Use sitemapindex for more |
| Max file size | 50MB uncompressed | Compressed .gz doesn't count |
| URL format | Absolute URL required | Must include https:// |
| Encoding | UTF-8 | Declare in XML header |
| Namespace | sitemap.org/schemas/sitemap/0.9 | Required for urlset |
| Required element | <loc> | All other elements optional |
Common Sitemap Errors
| Error | Cause | Fix |
|---|---|---|
| Invalid XML | Missing closing tags, unescaped characters | Use XML validator, escape & < > |
| Relative URLs | Missing protocol/domain | Use full https://example.com/page |
| Duplicate URLs | Same URL listed multiple times | Remove duplicates, canonicalize |
| Wrong namespace | Missing or incorrect xmlns attribute | Add xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" |
| Exceeds limits | >50,000 URLs or >50MB | Split into multiple sitemaps with index |
| Non-canonical URLs | Redirects, 404s, noindex pages | Only include canonical, indexable URLs |
Sitemap Best Practices
- Include only canonical URLs: Exclude redirects, 404s, noindex pages, and duplicate content
- Use absolute URLs: Always include full URL with protocol (https://example.com/page)
- Keep sitemaps updated: Update when content changes; accurate lastmod dates help crawl prioritization
- Split large sitemaps: Use sitemapindex when approaching 50,000 URLs or 50MB limit
- Reference in robots.txt: Add "Sitemap: https://example.com/sitemap.xml" to robots.txt
- Submit to Search Console: Submit via Google Search Console and Bing Webmaster Tools
- Use gzip compression: Compress sitemaps to .xml.gz for faster download (limit still 50MB uncompressed)
- Validate before submission: Check XML validity and structure before submitting to search engines
How to Submit Sitemap to Search Engines
- Google Search Console:
- Verify site ownership in Search Console
- Navigate to "Sitemaps" in left menu
- Enter sitemap URL (e.g., sitemap.xml)
- Click "Submit"
- Monitor coverage and errors in report
- Bing Webmaster Tools:
- Verify site in Bing Webmaster Tools
- Go to "Sitemaps" section
- Submit sitemap URL
- Review crawl statistics
- robots.txt reference:
- Add line:
Sitemap: https://example.com/sitemap.xml - Place at top or bottom of robots.txt
- Search engines discover sitemap automatically
- Add line:
Sitemap Priority and Change Frequency Reference
| Element | Values | Default | Notes |
|---|---|---|---|
| <priority> | 0.0 to 1.0 | 0.5 | Relative importance within site |
| <changefreq> | always/hourly/daily/weekly/monthly/yearly/never | N/A | Hint for crawlers (often ignored) |
| <lastmod> | ISO 8601 date (YYYY-MM-DD) | N/A | Most important for crawl prioritization |
Sitemap Example for Different Site Types
E-commerce Site Sitemap Structure: sitemap.xml (index file) ├── sitemap-products.xml (product pages) ├── sitemap-categories.xml (category pages) ├── sitemap-brands.xml (brand pages) ├── sitemap-blog.xml (blog posts) └── sitemap-static.xml (about, contact, etc.) Each sub-sitemap contains up to 50,000 URLs. Index file references all sub-sitemaps. Submit only the index file to Search Console.
Frequently Asked Questions
- What is a sitemap and why is it important?
- A sitemap is an XML file that lists all important URLs on a website to help search engines discover and crawl content. It provides metadata about each URL (last modified date, change frequency, priority) and is essential for SEO, especially for large sites, new sites without many backlinks, and sites with deep page hierarchies.
- What is the difference between urlset and sitemapindex?
- A urlset is a standard sitemap containing individual URLs (up to 50,000 URLs max). A sitemapindex is a sitemap of sitemaps, used when a site exceeds 50,000 URLs or 50MB uncompressed. Each entry in a sitemapindex points to another sitemap file. Google can process both formats and will follow sitemapindex references automatically.
- What are the sitemap XML requirements?
- Sitemap XML must: use UTF-8 encoding, have proper XML declaration, use the sitemap.org namespace (http://www.sitemaps.org/schemas/sitemap/0.9), contain valid URL elements with
tags, escape special characters (&, <, >), and be well-formed XML. File size must be under 50MB uncompressed with max 50,000 URLs per sitemap. - How do I submit a sitemap to Google?
- Submit sitemaps via Google Search Console: verify site ownership, go to Sitemaps section, enter sitemap URL (e.g., sitemap.xml), click Submit. Also reference sitemap in robots.txt using 'Sitemap: https://example.com/sitemap.xml'. Google will crawl the sitemap and add URLs to its crawl queue.
- What common sitemap errors should I avoid?
- Common errors: including non-canonical URLs (redirects, 404s, noindex pages), exceeding 50,000 URLs or 50MB limit, using incorrect namespace, missing XML declaration, including URLs blocked by robots.txt, having duplicate URLs, not updating lastmod dates, and using absolute vs relative URL inconsistencies.
- How often should I update my sitemap?
- Update sitemaps whenever content changes: new pages added, URLs modified, content significantly updated. For dynamic sites, generate sitemaps automatically. For static sites, update manually after each change. Search engines use
dates to prioritize crawling, so accurate dates improve crawl efficiency.