SEO Tools are selling you a file that AI never opens: The Truth behind LLM.txt & preparing your website for AEO

Context by:

Will Bainbridge & Ben Flint

28 October 2025

Your SEO audit flagged a missing LLM.txt file. The tool positioned it as critical for AI visibility. You assumed it was the next checkbox in the optimisation playbook.

It's not. It's a proposed standard with zero adoption from the platforms that matter.

The Pitch vs Reality

Australian data scientist, Jeremy Howard introduced LLM.txt in September 2024 as a "robots.txt for large language models". The idea: a markdown file at your domain root listing pages you want AI tools to prioritise. Strip away navigation chrome, serve clean text, reduce token overhead.

SEO tools adopted it quickly. Yoast SEO, GitBook, and major CMS platforms rolled out automatic generation.

SEMRush started flagging missing files in audits. The SEO community treated it as essential for AI search visibility.

One problem: the platforms never signed on.

In April 2025, John Mueller addressed the confusion on Reddit. Someone asked why SEMRush flagged their missing LLM.txt file. Mueller's response: "It's like the meta keywords tag. No one uses it. "He clarified that Google, OpenAI, and Anthropic don't request the file. The bots aren't checking for it.

Gary Illyes reinforced this in July at a Search Central event: "To get your content to appear in AI Overview, simply use normal SEO practices. You don't need GEO, LLMO or anything else.

"The tools are selling a feature the platforms don't support. That's the gap.

Why the Model Doesn't Fit & The Real Opportunity

Search engines crawl your entire site upfront, then index what they find. That's why robots.txt exists. You need a way to block sections during the crawl phase.

LLMs don't crawl. They retrieve content at inference time, the moment someone asks a question. The model fetches specific URLs that seem relevant, parses the HTML in real time, and extracts chunks that might answer the query.

A static file listing "priority pages" solves a problem this workflow doesn't have. The retrieval logic already uses ranking signals, embeddings, and link context to decide which URLs to fetch. Adding another file (one the model has to request, parse, and weight against existing signals) introduces overhead without clear gain.

The platforms had the option to build support for LLM.txt. They chose not to. That's not a bug or a delay. It's a design decision based on how their systems already work.

AI-driven traffic is tiny by volume. Less than 1% for most sites. But Ahrefs published data showing their AI referrals convert above 10%, far higher than organic search or social.

The pattern: AI-driven visitors arrive with high intent. They've already filtered options through a conversational query. They're at the decision stage.

That reframes where to optimise. You're not chasing scale. You're chasing citation quality. When someone asks Perplexity or ChatGPT a question, you want your content surfaced as the answer and cited as the source.

The mechanism isn't a file at your root directory. It's how you structure every page.

Bing's Krishna Madhavan explained this in October 2025. AI assistants don't read pages top to bottom. They chunk content into discrete blocks (paragraphs, sections, lists) and scan those chunks for relevance. Models extract the best match and synthesise a response.

If your content isn't chunked in a way that isolates clear answers, models skip it during retrieval.

What to Build Instead, Threshold Check & Bottom Line

Here's the checklist we run on client sites where AI citation matters:

Write sections as a question-answer pair
- ‍Lead with the question readers ask. Answer it in the first sentence. Expand only if supporting detail adds value. This makes each chunk retrieval-ready.
Use headings that declare the question
- ‍Not "Benefits" or "Features". Write "What are the main benefits?" or "What features are included?" Headings signal chunk intent to models during parsing.
Put the answer first, evidence second
- ‍Front-load the core claim or instruction at the top of each section. Caveats, examples, and supporting data come after. Models prioritise early text when scoring relevance.
Add schema where it fits
- ‍FAQPage schema marks question-answer pairs explicitly. HowTo schema does the same for step-by-step guides. These aren't ranking factors, but they help models parse structure faster during inference.
Keep sections between 150-300 words
- ‍Longer blocks dilute relevance. Shorter blocks lack context. Test each section in isolation. Can it stand alone as a complete answer?
Publish accessible versions of gated content
- ‍If you have critical information behind a paywall or in a PDF, mirror it in public HTML. Models won't parse paywalls or complex document formats during real-time retrieval.
- ‍

LLM.txt could matter if these conditions change:

A major platform announces support and begins requesting the file
Server logs show measurable bot traffic to LLM.txt endpoints.
Sites with LLM.txt demonstrate higher citation rates in AI responses compared to control groups.

None of those exist today.

SEO tools flagging missing LLM.txt files are selling infrastructure the platforms don't support. It's a proposed standard that arrived after the systems were built, and no platform has retrofitted support.

The real optimisation happens at the page level. Structure content so every section answers a discrete question, uses semantic headings, and front-loads the answer. That works in ChatGPT, Perplexity, and Google AI Overviews today. It'll work when the next model launches.

We've built this system for client sites over the past year. The complete framework (content structure templates, schema implementation guides, and audit checklists) is available in our Agent-Optimised SEO System.