This Week in Brief
Google formalized its GEO guidance with an official optimization document published 15 May 2026, making this the most explicit on-record statement from the company about what influences AI feature citations. Simultaneously, Google I/O confirmed AI Mode has surpassed one billion monthly users — a scale signal that makes AI search visibility no longer optional for content teams. Two new retrieval papers from Meta and CUHK offer practitioners concrete architectural clues about how next-generation RAG agents select and compile source content.
AI Lab Signals
Google publishes first official guide to optimizing for generative AI search features
On 15 May 2026, Google released 'Optimizing your website for generative AI features on Google Search,' announced by John Mueller via Search Central Blog and housed under a new 'Generative AI fundamentals' navigation section. This is Google's most explicit, on-record statement to date about what signals it considers when surfacing content in AI Overviews and related features. Practitioners should treat this document as the canonical reference for technical and content decisions targeting AI feature visibility.
Google AI Mode surpasses one billion monthly users; queries doubling every quarter
At Google I/O 2026, the company confirmed AI Mode has exceeded one billion monthly users, with query volume more than doubling every quarter since launch — and overall queries reaching an all-time high last quarter. Google also announced new agentic capabilities accessible directly through a query, and described the Search box as undergoing its 'biggest upgrade in over 25 years.' The scale data confirms that AI-mediated search is now a mainstream distribution channel, not an experimental one.
Google AI Overviews now live in 120+ countries across 11 languages, powered by Gemini
A detailed breakdown published 14 May 2026 confirms AI Overviews use customized Gemini models layered on top of Google's existing Search ranking infrastructure, with a multi-stage pipeline: intent classification, document selection, passage extraction, answer generation, and safety filtering. The piece notes that AI Overviews already appear on 18% of all Google queries and 57% of long-tail queries, with Google's own AI Mode producing zero clicks on 93% of searches. Practitioners optimizing for citation must understand that passage-level extractability — not page-level ranking — is the primary retrieval variable in this pipeline.
Training Data & Crawl
French Premium Web Corpus v1.3 released under EU AI Act Article 10 compliance framework
FINALEADS LLC published the dataset specification for FPWC v1.3.0-2026-05-16, a vertical French-language training corpus drawn exclusively from EU public sources covering finance, regulation, and economics. The release is documented to satisfy EU AI Act Article 10 / Annex IV data governance obligations, with a published SHA-256 pipeline content hash and methodology grounded in Gebru et al.'s Datasheets for Datasets framework. For GEO practitioners targeting French-language AI systems, this corpus release signals which source categories are being indexed for model training in regulated EU verticals — authoritative regulatory and financial sources appear to carry structural weight in this pipeline.
AI Search & ASO
Perplexity's citation model and publisher relationship clarified in 2026 product overview
A 2026 product analysis of Perplexity confirms the platform operates as a real-time retrieval and synthesis engine distinct from Google's AI Overviews, with a different citation selection mechanism and an explicit 'publisher question' around revenue sharing and content access. Practitioners targeting Perplexity citations should note the platform favors directly answerable, well-structured content with clear sourcing — and that Perplexity's user base is actively growing as a Google alternative for research-style queries. Platform-specific citation strategies are confirmed necessary: a one-size-fits-all GEO approach is insufficient across ChatGPT, Perplexity, and Google AI Mode.
Less than 20% overlap confirmed between top Google organic results and AI-cited sources
Practitioner analysis published in May 2026 cites research finding fewer than 20% of sources cited in AI-generated answers overlap with top-10 Google organic rankings, reinforcing that traditional SEO rank is a poor proxy for AI citation likelihood. The same analysis notes AI engines favor content that answers queries clearly within the first 200 words, and that brand mentions across YouTube, Reddit, and Wikipedia now influence AI visibility more strongly than conventional backlinks. Teams should audit their content for answer-first structure and off-site entity authority, not just domain authority scores.
Research Radar (arXiv)
Superintelligent Retrieval Agent: The Next Frontier of Information Retrieval
The paper introduces SIRA (SuperIntelligent Retrieval Agent), which reframes retrieval quality as the ability to compress multi-round exploratory search into a single corpus-discriminative retrieval action — selecting terms that distinguish the desired document from the rest of the corpus, rather than terms that are merely topically relevant. For GEO practitioners, this has a direct structural implication: content that uses precise, corpus-discriminative language (specific named entities, exact figures, unambiguous terminology) is more likely to be selected by next-generation retrieval agents than content written in generic topical prose. (Pre-publication / arXiv)
SkillRAE: Agent Skill-Based Context Compilation for Retrieval-Augmented Execution
SkillRAE proposes a two-stage retrieval-augmented execution framework that separates skill retrieval from context compilation, focusing on how retrieved content is organized into a compact, grounded, and immediately usable form for downstream task execution. The research highlights that how evidence is structured after retrieval — not only whether it is retrieved — determines downstream output quality. For ASO and GEO practitioners, this supports the tactical priority of writing content in modular, self-contained passages that can be extracted and compiled without editorial transformation by an AI agent. (Pre-publication / arXiv)
Practitioner Takeaway
Audit your highest-value pages for answer-first structure this week: the answer to the primary query should appear in full within the first 200 words, before any narrative context, author background, or internal linking blocks. Google's newly published optimization guide, the sub-20% overlap finding between organic rankings and AI citations, and SIRA's retrieval architecture all converge on the same signal — AI retrieval systems select passages, not pages, and they favour content that is immediately extractable without requiring the model to infer or paraphrase. Pair this with a pass on entity specificity: replace generic category language ('a leading provider of...') with named entities, cited statistics, and unambiguous terminology that functions as a corpus-discriminative signal in retrieval pipelines.
The 6-phase framework used to structure this newsletter is available as a complete methodology guide — including audit tools, templates, and implementation checklists.
Get the Framework — $20/mo or $200/yrNew to AI knowledge publication? Download the free briefing flyer — the data case for why your organisation cannot wait.