Discoverability Checks

D1 robots.txt, D2 XML sitemap, D3 Link header — what each means and how to fix it.

Discoverability answers a simple question: can an agent actually find your pages? If the agent is blocked at the front door or has to guess at your URL structure, nothing else matters.

D1 — robots.txt present & sane (weight 6)

AIScan fetches /robots.txt and checks three things: it returns 200, it doesn't catch-all blockUser-agent: * with Disallow: /, and it advertises your Sitemap: URL.

How to fix

Serve a non-blocking robots file that references your sitemap. Minimum viable:

User-agent: *
Allow: /

Sitemap: https://yourdomain.com/sitemap.xml

D2 — XML sitemap (weight 4)

AIScan looks at /sitemap.xml and /sitemap_index.xml. A valid sitemap lets agents discover every URL without crawling the entire site graph.

How to fix

WordPress — Yoast SEO, Rank Math, or core 5.5+ auto-generate one.
Shopify — generated automatically at /sitemap.xml.
Next.js / TanStack / SPA — generate one at build or via a server route.

Reference it from robots.txt so a single Sitemap: line covers discovery.

D3 — Link header for discovery (weight 3)

AIScan inspects the HTTP Link response header on your homepage. A rel=api-catalog or rel=describedby entry points agents at a machine-readable description of what your site exposes — without them having to guess.

How to fix

Add a header at your CDN or framework level:

Link: </.well-known/api-catalog>; rel="api-catalog"

If you don't have an API catalog yet, this check is informational on content sites — start with P1 API Catalog first.