Bot Access Checks

B1 Content Signals, B2 AI bot rules, B3 Web Bot Auth.

The default User-agent: * rule is too coarse for the AI era. Agents that train models, agents that ground answers in your live site, and agents that act on a user's behalf are three different things — and you probably feel differently about each. Bot Access is where you say so.

B1 — Content Signals in robots.txt (weight 5, optional)

Cloudflare's Content-Signal directive is an emerging proposal that lets you state, in one line, what crawlers may do with your content. AIScan flags it as informational because the standard is still in draft — it cannot lower your score.

How to fix

# robots.txt
Content-Signal: search=yes, ai-train=no, ai-input=yes

search — list this page in search results.
ai-train — use this page to train models.
ai-input — fetch this page to ground a single AI answer.

B2 — Explicit AI bot rules (weight 5)

AIScan looks for explicit User-agent: blocks naming the known AI crawlers. Four or more = pass, one to three = partial credit. The current watchlist:

GPTBot, OAI-SearchBot, ChatGPT-User, ClaudeBot, anthropic-ai, Claude-User, PerplexityBot, Perplexity-User, Google-Extended, Applebot-Extended, Bytespider, Amazonbot, meta-externalagent, CCBot, cohere-ai, Diffbot.

How to fix

User-agent: GPTBot
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Google-Extended
Allow: /

Replace Allow: / with Disallow: / for any bot you'd rather not feed.

B3 — Web Bot Auth key directory (weight 3, optional)

Web Bot Auth is an IETF draft for cryptographically identifying bots via HTTP Message Signatures. AIScan checks for a JWKS-style directory at /.well-known/http-message-signatures-directory. Informational only — the standard is still experimental.

How to fix

If you operate an agent, publish your signing keys at the well-known path. Most site owners don't need to publish anything here.