Discoverability Checks
D1 robots.txt, D2 XML sitemap, D3 Link header — what each means and how to fix it.
Discoverability answers a simple question: can an agent actually find your pages? If the agent is blocked at the front door or has to guess at your URL structure, nothing else matters.
D1 — robots.txt present & sane (weight 6)
AIScan fetches /robots.txt and checks three things: it returns 200, it doesn't catch-all blockUser-agent: * with Disallow: /, and it advertises your Sitemap: URL.
How to fix
Serve a non-blocking robots file that references your sitemap. Minimum viable:
User-agent: *
Allow: /
Sitemap: https://yourdomain.com/sitemap.xmlD2 — XML sitemap (weight 4)
AIScan looks at /sitemap.xml and /sitemap_index.xml. A valid sitemap lets agents discover every URL without crawling the entire site graph.
How to fix
- WordPress — Yoast SEO, Rank Math, or core 5.5+ auto-generate one.
- Shopify — generated automatically at
/sitemap.xml. - Next.js / TanStack / SPA — generate one at build or via a server route.
Reference it from robots.txt so a single Sitemap: line covers discovery.
D3 — Link header for discovery (weight 3)
AIScan inspects the HTTP Link response header on your homepage. A rel=api-catalog or rel=describedby entry points agents at a machine-readable description of what your site exposes — without them having to guess.
How to fix
Add a header at your CDN or framework level:
Link: </.well-known/api-catalog>; rel="api-catalog"If you don't have an API catalog yet, this check is informational on content sites — start with P1 API Catalog first.