Skip to main content

Overview

“Missing from Site” identifies URLs listed in your sitemap.xml that weren’t found during the crawl. These are typically pages that return 404 errors, have been deleted, or are inaccessible—yet they’re still declared in your sitemap as valid pages.
Sitemap vs Reality
═══════════════════════════════════════════════════════════

sitemap.xml declares:          Actual site returns:
─────────────────────          ────────────────────
/about           ────────────→  200 OK ✓
/products        ────────────→  200 OK ✓
/old-product     ────────────→  404 Not Found ✗
/deleted-page    ────────────→  404 Not Found ✗
/typo-url        ────────────→  404 Not Found ✗

                                    └─ These are "Missing from Site"

Why This Matters

Listing non-existent pages in your sitemap causes several problems:
IssueImpact
Wasted crawl budgetSearch engines spend resources on dead pages
Poor indexing signalsIndicates poor site maintenance
Sitemap trust erosionSearch engines may deprioritize your sitemap
User experienceUsers following sitemap-based links hit 404s
SEO authority lossBroken pages can’t pass link equity
Google explicitly recommends only including canonical, 200-status URLs in your sitemap. Including 404s violates their sitemap guidelines and can reduce crawl efficiency.

How We Detect Missing Pages

Our analysis cross-references your sitemap with crawl results:
1

Fetch Sitemap

We download and parse your sitemap.xml (and any sitemap index files).
2

Track Crawl Results

During the crawl, we record the HTTP status of every URL encountered.
3

Match Sitemap URLs

We check each sitemap URL against our crawl results.
4

Flag Missing Pages

URLs in the sitemap that returned 404, 410, or weren’t reachable are flagged.

Common Causes

1. Deleted Content

Content removed without updating the sitemap:
Timeline:
─────────────────────────────────────────
Jan 1:   /winter-sale created, added to sitemap
Jan 31:  /winter-sale deleted
Feb 15:  Sitemap still contains /winter-sale ← Problem

2. URL Structure Changes

Restructuring URLs without proper cleanup:
Old URLNew URLSitemap Status
/blog/my-post/articles/my-postStill lists old URL
/products/widget/shop/widgetStill lists old URL
/about-us/aboutStill lists old URL

3. CMS Migration Issues

Moving between content management systems:
  • Import errors creating bad URLs
  • Different URL slug generation rules
  • Missing content from partial migrations
  • Draft content included by mistake

4. Auto-Generated Sitemaps

CMS plugins that don’t clean up properly:
WordPress Example:
─────────────────────────────────────────
Plugin auto-adds all posts to sitemap ✓
Post is trashed ✓  
Post is deleted ✓
Sitemap still references post ✗ ← Plugin didn't update

5. Manual Sitemap Errors

Hand-maintained sitemaps with mistakes:
  • Typos in URLs
  • Copy-paste errors
  • Forgotten test URLs
  • Outdated entries never reviewed

6. Staging/Dev URLs

Test URLs accidentally included:
<!-- These shouldn't be in production sitemap -->
<url>
  <loc>https://example.com/test-page-123</loc>
</url>
<url>
  <loc>https://example.com/staging-preview</loc>
</url>

How to Fix

Step 1: Verify the Issue

First, confirm the pages are truly missing:
Visit each URL directly:
https://example.com/missing-page

Expected: 404 Not Found page

Step 2: Decide on Action

For each missing URL, choose an approach:
ScenarioAction
Content moved to new URLSet up 301 redirect, update sitemap
Content intentionally deletedRemove from sitemap
Content should existRestore or recreate the page
URL has typo in sitemapFix the URL in sitemap
Duplicate/outdated entryRemove from sitemap

Step 3: Update Your Sitemap

Remove or fix the problematic entries:
<!-- REMOVE these entries from sitemap.xml -->

<!-- Deleted page - remove entirely -->
<url>
  <loc>https://example.com/deleted-page</loc>
  <lastmod>2024-01-15</lastmod>
</url>

<!-- Moved page - remove old URL (redirect handles SEO) -->
<url>
  <loc>https://example.com/old-url</loc>
  <lastmod>2024-01-15</lastmod>
</url>

<!-- ADD the new location if it moved -->
<url>
  <loc>https://example.com/new-url</loc>
  <lastmod>2024-12-01</lastmod>
</url>

Step 4: Set Up Redirects (If Applicable)

For moved content, configure proper redirects:
# Redirect single page
Redirect 301 /old-page /new-page

# Redirect pattern
RedirectMatch 301 ^/blog/(.*)$ /articles/$1

Step 5: Regenerate Sitemap

After cleanup, regenerate your sitemap:
CMSHow to Regenerate
WordPressUse Yoast/RankMath to rebuild
ShopifyAutomatic (wait 24-48 hours)
Next.jsRebuild with next build
StaticRun sitemap generator script
ManualEdit sitemap.xml directly

Preventing Future Issues

Automated Sitemap Generation

Use tools that automatically maintain sitemap accuracy:
// Example: Dynamic sitemap generation
export async function generateSitemap() {
  const pages = await getAllPublishedPages() // Only live pages
  
  return pages
    .filter(page => page.status === 'published')
    .filter(page => !page.noindex)
    .map(page => ({
      url: page.url,
      lastmod: page.updatedAt,
    }))
}

Pre-Delete Checklist

Before deleting any page:
1

Check Inbound Links

Are other pages linking to this URL?
2

Check Search Rankings

Is this page ranking for valuable keywords?
3

Set Up Redirect

If valuable, redirect to relevant alternative.
4

Update Sitemap

Remove the URL from sitemap.xml.
5

Update Internal Links

Fix any internal links pointing to deleted page.

Regular Audits

Schedule sitemap health checks:
FrequencyAction
WeeklyAutomated sitemap validation
MonthlyManual review of sitemap entries
QuarterlyFull sitemap audit with crawl comparison
After major changesImmediate sitemap verification
Set up monitoring alerts for sitemap changes. Any significant increase in sitemap size without corresponding content creation could indicate issues.

Interpreting Results

Severity Assessment

Missing PagesSeverityRecommended Action
0✅ ExcellentSitemap is accurate
1-5🟡 LowFix during regular maintenance
6-20🟠 MediumPrioritize cleanup this week
20+🔴 HighImmediate sitemap audit needed

Percentage Threshold

Consider the ratio of missing pages:
Missing Pages: 15
Total Sitemap URLs: 500
Percentage: 3%

Assessment: Acceptable, but should be fixed
Target: < 1% missing pages

False Positives

Some URLs may appear missing but aren’t problems:
ScenarioExplanation
Recently publishedPage exists but wasn’t in this crawl
Protected contentLogin required, crawl couldn’t access
Rate limitingServer blocked crawler temporarily
Geo-restrictedPage not available in crawler’s region
If you believe a page is incorrectly flagged, manually verify it exists by visiting the URL directly. Our crawler may have been blocked or rate-limited.
Missing pages often indicate broader problems:
Related IssueConnection
Orphan pagesMay have been orphaned before deletion
Broken internal linksOther pages may link to missing URLs
Redirect chainsMissing pages may be part of broken chains
Index bloatOld URLs may still be indexed

Next Steps