How the evidence maps to the framework

Core thesis

The academic literature validates the problem; AI Knowledge Signal productises the response. Aggarwal optimises the page. CORE optimises the output rank. Chen et al. explain the AI-search shift. AI Knowledge Signal engineers the whole knowledge supply chain.

The three foundational papers

Three empirical studies anchor the framework. Each identifies a real mechanic of AI-mediated discovery; the framework turns those mechanics into an operating model organisations can implement responsibly.

Optimises the page

GEO: Generative Engine Optimization

Aggarwal et al. (2024)

Tests content-level methods — citations, statistics, quotations, fluency, readability — and finds they can lift visibility in generative answers by up to ~40%, while keyword stuffing performs poorly.

What AKS adds Turns page-level GEO tactics into a full six-phase knowledge-publication framework.

Optimises the output rank

CORE: Controlling Output Rankings in Generative Engines

Jin et al. (2026)

Shows LLM-based search rankings can be influenced by the retrieved content and its initial order — especially reasoning- and review-style text — and names the manipulation risk explicitly.

What AKS adds Reframes ranking influence as ethical, evidence-backed representation engineering — not covert manipulation.

Explains the AI-search shift

Navigating the Shift: Web Search vs. Generative AI

Chen et al. (2026)

An empirical comparison showing AI answer engines diverge from Google in cited domains, source typology, freshness signals, and pre-training effects.

What AKS adds Converts the empirical findings into a practical model for publishing, structuring, validating, and monitoring AI-facing knowledge.

What the AI providers say

Beyond the academic literature, each major AI provider publishes official guidance on how its systems crawl, rank, and cite the web. The framework aligns to what providers state on the record — summarised here with links to the primary documentation.

Provider / surface What official guidance says Official source(s)
Google Search
(AI Overviews + AI Mode)Confidence: very high
Apply normal SEO fundamentals: publish unique, useful, people-first, non-commodity content; keep pages crawlable, indexable, and snippet-eligible; align visible text with schema; use internal linking and good page experience. Not required: llms.txt or AI-specific text files, artificial content chunking, rewriting purely for AI, inauthentic mentions, or over-focusing on schema as an AI-specific lever.
OpenAI
(ChatGPT Search + Atlas)Confidence: high (access)
Allow OAI-SearchBot for ChatGPT Search discovery and citation; allow published OpenAI IPs through your CDN/WAF; keep the site public and crawlable; improve accessibility/ARIA for the ChatGPT agent in Atlas. Separate the policies for OAI-SearchBot, GPTBot (training), and ChatGPT-User. No broad content-format playbook published; format levers (FAQ schema, tables, answer blocks) are not officially validated for ChatGPT Search.
Anthropic
(Claude)Confidence: high (access)
Choose which Anthropic robots to allow by goal: ClaudeBot (possible model training), Claude-User (user-directed retrieval), Claude-SearchBot (search-result quality and visibility). Anthropic's bots respect robots.txt and Crawl-delay; robots.txt is the official opt-out (IP blocking may not reliably opt out). No official public GEO/content-optimisation playbook for Claude; third-party Claude guides are interpretation, not provider confirmation.
Microsoft
(Copilot + Bing AI)Confidence: very high
The strongest official content-structure guidance: traditional SEO baseline plus schema (JSON-LD), clear headings, modular layouts, semantic clarity, measurable facts, bullets/numbers, concise answers, Q&A blocks, tables, and self-contained phrasing. Flags as risks: long walls of text, answers hidden in tabs/expandables, core info trapped in PDFs or images, overloaded sentences, and unanchored claims.
PerplexityConfidence: high (access) Allow PerplexityBot in robots.txt and permit published IP ranges so the site can appear in Perplexity results; Perplexity-User supports user actions and can visit pages to provide accurate, linked answers. PerplexityBot is not used for foundation-model pre-training. No full public content-structure playbook; blocked pages may still surface domain, headline, and a brief factual summary.

Academic source bank

The full evidence base behind the framework — peer-reviewed papers and standards sources, each linked to its canonical version, with the contribution it makes to the six phases.

Paper / source Primary relevance to the framework
Aggarwal et al. (2024) — GEO: Generative Engine OptimizationAcademic paper Direct GEO evidence: citations, quotations, statistics, fluency, and content presentation improve visibility; keyword stuffing performs poorly.
Jin et al. (2026) — Controlling Output Rankings in Generative Engines (CORE)Academic paper LLM-based rankings are strongly influenced by retrieved content and initial retrieval order; content can shape output ranking.
Chen et al. (2026) — Navigating the Shift: Web Search and Generative AI Response GenerationAcademic paper AI answer engines diverge from Google in cited domains, source typology, freshness, and pre-training effects.
Liu, Zhang & Liang (2023) — Evaluating Verifiability in Generative Search EnginesAcademic paper Generative search requires citation recall and precision; unsupported statements and weak citations reduce trust.
Menick et al. (2022) — Teaching Language Models to Support Answers with Verified Quotes (GopherCite)Academic paper Open-book QA with specific evidence and quotes improves appraisal of correctness; uncertainty handling is part of trust.
Nakano et al. (2021) — WebGPT: Browser-assisted Question-answering with Human FeedbackAcademic paper Web-browsing QA uses search, navigation, and references to support long-form answers.
Guu et al. (2020) — REALM: Retrieval-Augmented Language Model Pre-TrainingAcademic paper Retrieval-augmented models attend over documents, making accessible and retrievable content foundational.
Mialon et al. (2023) — Augmented Language Models: a SurveyAcademic paper Augmented LMs use external tools and modules, including retrieval, expanding context beyond model parameters.
Brin & Page (1998) — The Anatomy of a Large-Scale Hypertextual Web Search EngineAcademic paper Classic search architecture uses crawling, indexing, and hyperlink structure; connected information architecture matters.
Kumar, Shaik & Furqan (2019) — A Survey on Search Engine Optimization TechniquesAcademic paper SEO literature supports crawlability, page structure, links, and technical hygiene — while GEO evidence shows classic keyword stuffing is insufficient.
Liu et al. (2023) — G-Eval: NLG Evaluation using GPT-4 with Better Human AlignmentAcademic paper LLM-based evaluation can assess subjective response quality using structured criteria; useful for monitoring AI visibility.
Wan, Wallace & Klein (2024) — What Evidence Do Language Models Find Convincing?Academic paper RAG models rely heavily on query relevance; corpus and evidence quality are central to trustworthy outputs.
Qin et al. (2024) — LLMs are Effective Text Rankers with Pairwise Ranking PromptingAcademic paper LLMs can operate as rankers; pairwise ranking supports benchmarking and comparative visibility measurement.
Schema.org / Google Structured Data DocumentationStandards source Structured data gives machines explicit entity and relationship metadata using shared vocabularies.

Methodology & scope

How this evidence base was assembled Compiled from the AI Knowledge Signal GEO Framework Evidence Review (prepared 5 May 2026). The base comprises three core empirical papers — Aggarwal et al. (2024), Jin et al. (2026, CORE), and Chen et al. (2026) — supported by eleven further academic and standards sources, and the official guidance of five AI providers, mapped across the framework's six phases. Official provider statements are distinguished from third-party interpretation; where a provider has not published guidance on a topic, that gap is stated rather than inferred. Links resolve to the canonical version of each source (DOI, arXiv, ACL Anthology, or the provider's own documentation).

Related

See the Glossary for canonical definitions of the terms used above, What You Get for how the framework is delivered, or the FAQ for how AI training and retrieval pipelines work.

AI Knowledge Signal is a product of Digital Human Assistants.