How the evidence maps to the framework
Core thesis
The academic literature validates the problem; AI Knowledge Signal
productises the response. Aggarwal optimises the page. CORE optimises
the output rank. Chen et al. explain the AI-search shift. AI Knowledge
Signal engineers the whole knowledge supply chain.
The three foundational papers
Three empirical studies anchor the framework. Each identifies a real
mechanic of AI-mediated discovery; the framework turns those mechanics
into an operating model organisations can implement responsibly.
Optimises the page
Aggarwal et al. (2024)
Tests content-level methods — citations, statistics, quotations,
fluency, readability — and finds they can lift visibility in
generative answers by up to ~40%, while keyword stuffing performs
poorly.
What AKS adds
Turns page-level GEO tactics into a full six-phase knowledge-publication
framework.
Optimises the output rank
Jin et al. (2026)
Shows LLM-based search rankings can be influenced by the retrieved
content and its initial order — especially reasoning- and review-style
text — and names the manipulation risk explicitly.
What AKS adds
Reframes ranking influence as ethical, evidence-backed representation
engineering — not covert manipulation.
Explains the AI-search shift
Chen et al. (2026)
An empirical comparison showing AI answer engines diverge from Google
in cited domains, source typology, freshness signals, and pre-training
effects.
What AKS adds
Converts the empirical findings into a practical model for publishing,
structuring, validating, and monitoring AI-facing knowledge.
What the AI providers say
Beyond the academic literature, each major AI provider publishes official
guidance on how its systems crawl, rank, and cite the web. The framework
aligns to what providers state on the record — summarised here with links
to the primary documentation.
| Provider / surface |
What official guidance says |
Official source(s) |
Google Search (AI Overviews + AI Mode)Confidence: very high |
Apply normal SEO fundamentals: publish unique, useful, people-first,
non-commodity content; keep pages crawlable, indexable, and
snippet-eligible; align visible text with schema; use internal
linking and good page experience.
Not required: llms.txt or AI-specific text files, artificial content chunking, rewriting purely for AI, inauthentic mentions, or over-focusing on schema as an AI-specific lever.
|
Optimizing for generative AI features
AI features and your website
|
OpenAI (ChatGPT Search + Atlas)Confidence: high (access) |
Allow OAI-SearchBot for ChatGPT Search discovery and citation;
allow published OpenAI IPs through your CDN/WAF; keep the site
public and crawlable; improve accessibility/ARIA for the ChatGPT
agent in Atlas. Separate the policies for OAI-SearchBot, GPTBot
(training), and ChatGPT-User.
No broad content-format playbook published; format levers (FAQ schema, tables, answer blocks) are not officially validated for ChatGPT Search.
|
Overview of OpenAI Crawlers
Publishers & Developers FAQ
ChatGPT Search
|
Anthropic (Claude)Confidence: high (access) |
Choose which Anthropic robots to allow by goal: ClaudeBot
(possible model training), Claude-User (user-directed retrieval),
Claude-SearchBot (search-result quality and visibility). Anthropic's
bots respect robots.txt and Crawl-delay; robots.txt is the official
opt-out (IP blocking may not reliably opt out).
No official public GEO/content-optimisation playbook for Claude; third-party Claude guides are interpretation, not provider confirmation.
|
Anthropic crawlers & site-owner blocking
|
Microsoft (Copilot + Bing AI)Confidence: very high |
The strongest official content-structure guidance: traditional SEO
baseline plus schema (JSON-LD), clear headings, modular layouts,
semantic clarity, measurable facts, bullets/numbers, concise
answers, Q&A blocks, tables, and self-contained phrasing.
Flags as risks: long walls of text, answers hidden in tabs/expandables, core info trapped in PDFs or images, overloaded sentences, and unanchored claims.
|
Optimizing content for AI Search Answers
AI Performance in Bing Webmaster Tools
|
| PerplexityConfidence: high (access) |
Allow PerplexityBot in robots.txt and permit published IP ranges so
the site can appear in Perplexity results; Perplexity-User supports
user actions and can visit pages to provide accurate, linked answers.
PerplexityBot is not used for foundation-model pre-training.
No full public content-structure playbook; blocked pages may still surface domain, headline, and a brief factual summary.
|
Perplexity Crawlers
How Perplexity follows robots.txt
|
Academic source bank
The full evidence base behind the framework — peer-reviewed papers and
standards sources, each linked to its canonical version, with the
contribution it makes to the six phases.
| Paper / source |
Primary relevance to the framework |
| Aggarwal et al. (2024) — GEO: Generative Engine OptimizationAcademic paper |
Direct GEO evidence: citations, quotations, statistics, fluency, and content presentation improve visibility; keyword stuffing performs poorly. |
| Jin et al. (2026) — Controlling Output Rankings in Generative Engines (CORE)Academic paper |
LLM-based rankings are strongly influenced by retrieved content and initial retrieval order; content can shape output ranking. |
| Chen et al. (2026) — Navigating the Shift: Web Search and Generative AI Response GenerationAcademic paper |
AI answer engines diverge from Google in cited domains, source typology, freshness, and pre-training effects. |
| Liu, Zhang & Liang (2023) — Evaluating Verifiability in Generative Search EnginesAcademic paper |
Generative search requires citation recall and precision; unsupported statements and weak citations reduce trust. |
| Menick et al. (2022) — Teaching Language Models to Support Answers with Verified Quotes (GopherCite)Academic paper |
Open-book QA with specific evidence and quotes improves appraisal of correctness; uncertainty handling is part of trust. |
| Nakano et al. (2021) — WebGPT: Browser-assisted Question-answering with Human FeedbackAcademic paper |
Web-browsing QA uses search, navigation, and references to support long-form answers. |
| Guu et al. (2020) — REALM: Retrieval-Augmented Language Model Pre-TrainingAcademic paper |
Retrieval-augmented models attend over documents, making accessible and retrievable content foundational. |
| Mialon et al. (2023) — Augmented Language Models: a SurveyAcademic paper |
Augmented LMs use external tools and modules, including retrieval, expanding context beyond model parameters. |
| Brin & Page (1998) — The Anatomy of a Large-Scale Hypertextual Web Search EngineAcademic paper |
Classic search architecture uses crawling, indexing, and hyperlink structure; connected information architecture matters. |
| Kumar, Shaik & Furqan (2019) — A Survey on Search Engine Optimization TechniquesAcademic paper |
SEO literature supports crawlability, page structure, links, and technical hygiene — while GEO evidence shows classic keyword stuffing is insufficient. |
| Liu et al. (2023) — G-Eval: NLG Evaluation using GPT-4 with Better Human AlignmentAcademic paper |
LLM-based evaluation can assess subjective response quality using structured criteria; useful for monitoring AI visibility. |
| Wan, Wallace & Klein (2024) — What Evidence Do Language Models Find Convincing?Academic paper |
RAG models rely heavily on query relevance; corpus and evidence quality are central to trustworthy outputs. |
| Qin et al. (2024) — LLMs are Effective Text Rankers with Pairwise Ranking PromptingAcademic paper |
LLMs can operate as rankers; pairwise ranking supports benchmarking and comparative visibility measurement. |
| Schema.org / Google Structured Data DocumentationStandards source |
Structured data gives machines explicit entity and relationship metadata using shared vocabularies. |
Methodology & scope
How this evidence base was assembled
Compiled from the AI Knowledge Signal GEO Framework Evidence Review
(prepared 5 May 2026). The base comprises three core empirical papers
— Aggarwal et al. (2024), Jin et al. (2026, CORE), and Chen et al.
(2026) — supported by eleven further academic and standards sources, and the
official guidance of five AI providers, mapped across the framework's six
phases. Official provider statements are distinguished from third-party
interpretation; where a provider has not published guidance on a topic, that
gap is stated rather than inferred. Links resolve to the canonical version of
each source (DOI, arXiv, ACL Anthology, or the provider's own documentation).