Back to Blog

llms.txt: How to Create the Essential File That Helps AI Understand Your Website

A step-by-step guide to creating and implementing llms.txt and llms-full.txt files for your website. Learn the specification, structure, best practices, and how this file helps AI models accurately cite your content in generative search results.

AI Models Are Reading Your Website. Are You Making It Easy for Them?

Every major AI model, from GPT to Claude to Gemini, crawls the web to retrieve information for generating answers. But here’s the problem most site owners don’t realize: your beautifully designed website is a nightmare for AI to parse. Navigation menus, sidebars, footers, cookie banners, embedded scripts, ad containers. The actual content that matters might represent 30% of your page’s HTML. The rest is noise that burns through an AI model’s context window without contributing useful information.

In September 2024, Jeremy Howard of Answer.AI proposed a solution: a simple markdown file called llms.txt that gives AI models a clean, curated index of your site’s most important content. It’s an elegant idea that has gained serious traction. Vercel, Cloudflare, Anthropic, and dozens of other organizations have adopted it. Chrome Lighthouse added an “Agentic Browsing” check for llms.txt in May 2026. And if you’re running an AI Agency or working in Generative Engine Optimization, implementing llms.txt is now a baseline requirement.

I’ve deployed llms.txt files across every client site we manage, and this guide covers everything I’ve learned about doing it well.

What Is llms.txt?

llms.txt is a markdown file hosted at your website’s root (e.g., yoursite.com/llms.txt) that provides a curated, structured overview of your site specifically designed for consumption by large language models. It was proposed by Jeremy Howard of Answer.AI as a pragmatic solution to a real problem: HTML is designed for browsers, not for AI.

Think of it this way. Robots.txt tells crawlers what they can access. Sitemap.xml tells crawlers what pages exist and when they changed. llms.txt tells AI models what your site is about and where to find the important content, in a format they can process efficiently.

The key distinction is purpose. Robots.txt is about access control. Sitemap.xml is about indexing. llms.txt is about comprehension. It’s the file that helps an AI model understand your site’s structure, content hierarchy, and key offerings without parsing hundreds of HTML pages.

Why llms.txt Matters for Your Site

The Noisy HTML Problem

When an AI model retrieves a web page during RAG (Retrieval-Augmented Generation), it needs to extract the meaningful content from the raw HTML. Modern websites make this surprisingly difficult.

A typical page includes the primary navigation with dozens of links, a sidebar with related posts or ads, footer links to privacy policies and terms, cookie consent banners, JavaScript bundles, schema markup, social sharing widgets, newsletter signup forms, and the actual content buried somewhere in between. For a 2,000-word blog post, the meaningful content might be 12,000 characters. The total HTML might be 80,000 characters.

AI models have finite context windows. Every token spent processing navigation markup and cookie banners is a token not spent understanding your actual content. llms.txt solves this by providing a clean, pre-processed index in plain markdown, no HTML parsing required.

The Comprehension Advantage

llms.txt doesn’t just save tokens. It provides contextual framing that raw page crawling cannot. When an AI model reads your llms.txt, it immediately understands what your site covers, how content is organized, and which pages are most important. This context helps the model make better retrieval decisions when answering user queries.

For any AI Agency focused on getting client content cited in AI-generated answers, this comprehension advantage is significant. A model that understands your site’s structure and authority areas is more likely to retrieve and cite your content accurately.

Is llms.txt a Ranking Factor?

Let me be direct about this: llms.txt is not a ranking factor for traditional search engines. Google has not indicated it uses llms.txt for ranking purposes, and there’s no evidence that having one improves your Google search positions.

However, calling it “just a nice-to-have” undersells its importance. llms.txt is a high-signal optimization for AI-generated search results. It helps AI models understand and cite your content more accurately in platforms like ChatGPT, Perplexity, and Gemini. As AI-generated answers consume an increasing share of search discovery, the ROI of llms.txt implementation grows proportionally.

Think of it the way we thought about meta descriptions in the early days of SEO. Not a direct ranking factor, but a critical signal that influences how your content appears in results.

The llms.txt Specification

The llmstxt.org specification defines a straightforward structure. Here’s how to build one.

Required Elements

H1 Heading. Your file starts with a single H1 heading containing your site or organization name. This is the top-level identifier that tells AI models whose content they’re reading.

Blockquote Summary. Immediately after the H1, include a blockquote with a concise description of your site, what it covers, and who it serves. Keep this to two to three sentences. This summary gives AI models the essential context they need to understand your site’s scope and authority.

Section Structure

After the header, organize your content into H2 sections. Each section groups related pages by topic or content type. Within each section, list your important pages as markdown links with brief descriptions.

The format for each entry is a standard markdown link followed by a colon and a short description. For example: [Page Title](URL): Brief description of what this page covers.

Example Structure

Here’s a simplified example of what a well-structured llms.txt looks like:

# Your Company Name

> Brief description of your company and what your site covers.
> Include your primary value proposition and content focus areas.

## Core Services

- [Service Overview](/services/): Description of your main service offerings
- [AI Automation](/services/ai-automation/): How you help businesses automate workflows with AI agents

## Blog

- [AI Agency Guide](/blog/ai-agency-guide/): Comprehensive guide to what AI agencies do
- [GEO Complete Guide](/blog/geo-guide/): How to optimize for AI-generated search results

## Documentation

- [API Reference](/docs/api/): Technical API documentation
- [Getting Started](/docs/getting-started/): Quickstart guide for new users

What to Include and What to Leave Out

Include: Your most important pages, cornerstone content, key service pages, high-value blog posts, and documentation. These are the pages you want AI models to know about and cite.

Leave out: Every individual blog post you’ve ever written, admin pages, login pages, duplicate content, paginated archives, tag pages, and low-value utility pages. llms.txt is a curated index, not a comprehensive sitemap. That’s what sitemap.xml is for.

The curation is the point. By selecting your best content, you signal to AI models which pages represent your highest-quality, most authoritative work. This is especially important for an AI Agency with dozens of blog posts, where you want AI models to prioritize your definitive guides over tangential posts.

Creating llms-full.txt

The optional companion file, llms-full.txt, takes the concept further. Where llms.txt provides an index with links, llms-full.txt provides the actual content of your key pages concatenated into a single markdown file.

Why llms-full.txt Exists

Some AI models and applications can process a single large text file more efficiently than crawling multiple individual pages. llms-full.txt gives these models your complete content in one request, formatted in clean markdown without any HTML noise.

How to Structure It

llms-full.txt follows the same organizational structure as your llms.txt but includes the full content of each listed page. Separate each page’s content with clear headers and horizontal rules so the model can distinguish between different pages.

# Your Company Name - Full Content

---

## AI Agency Guide

[Full content of your AI Agency guide in clean markdown]

---

## GEO Complete Guide

[Full content of your GEO guide in clean markdown]

Practical Considerations

File size. llms-full.txt can become very large. For sites with extensive content, prioritize your most important pages and keep the file under 100,000 tokens. Most AI models have context limits, and a file that exceeds them won’t be fully processed.

Maintenance. This file needs to be regenerated whenever your key content changes. Automate this if possible. Static site generators like Astro, which we use for our own site, make this straightforward through build-time scripts that concatenate markdown source files.

Formatting. Strip all HTML, images, and interactive elements. The content should be pure markdown text. Include headings, lists, bold text, and links, but remove anything that requires rendering.

How llms.txt Fits Into Your GEO Stack

llms.txt doesn’t work in isolation. It’s one component of a broader Generative Engine Optimization strategy. Here’s how the key files work together.

The File Ecosystem

robots.txt controls access. It tells AI crawlers (GPTBot, ClaudeBot, PerplexityBot) whether they’re allowed to crawl your site at all. This is the gatekeeper. If you block an AI crawler in robots.txt, no amount of llms.txt optimization will matter for that platform.

sitemap.xml guides discovery. It tells crawlers (both traditional and AI) what pages exist, when they were last updated, and their relative priority. AI crawlers use sitemaps just like Googlebot does.

llms.txt enables comprehension. Once an AI crawler has access (robots.txt) and knows what pages exist (sitemap.xml), llms.txt helps it understand your site’s structure, content hierarchy, and most important pages.

ai.txt declares preferences. It communicates your preferences about how AI systems use your content, covering training, retrieval, citation, and derivative works.

Structured data (JSON-LD) provides semantic meaning. FAQPage and HowTo schema markup tells AI models exactly what type of content exists on each page, making extraction more accurate.

Together, these files create a comprehensive AI-readability layer on top of your existing website. For any AI Agency implementing content-driven growth strategies, deploying this full stack is table stakes in 2026.

Who Has Adopted llms.txt?

The adoption curve for llms.txt has been faster than most web standards. Notable organizations with llms.txt files include:

Vercel uses llms.txt to help AI models understand their documentation, framework guides, and platform capabilities. Given that Vercel hosts millions of web applications, their adoption signals confidence in the format’s value.

Cloudflare adopted llms.txt across their documentation and developer resources. As an infrastructure company serving a significant portion of the web, their implementation influences how other organizations think about AI readability.

Anthropic, the company behind Claude, maintains a llms.txt file. When the maker of a leading AI model implements llms.txt on their own site, it’s a strong validation of the format’s practical value.

The Chrome Lighthouse “Agentic Browsing” audit, added in May 2026, checks for the presence of llms.txt as part of its accessibility and best practices evaluation. While not a pass/fail criterion, its inclusion in Lighthouse signals that llms.txt is moving from experimental to expected.

Best Practices From Real Implementations

After implementing llms.txt across multiple client sites, here are the patterns that produce the best results.

Keep It Curated, Not Comprehensive

The biggest mistake I see is treating llms.txt like a second sitemap. If you list every page on your site, you’ve defeated the purpose. The whole point is curation, telling AI models which content represents your best work.

For a site with 100 blog posts, your llms.txt might include 15 to 20 of your strongest, most comprehensive articles. For comparing LLM models, for example, I’d include the definitive comparison guide but not every individual model review.

Write Descriptions That Add Context

Don’t just list page titles and URLs. The description after each link should tell the AI model what it will find and why it matters. Compare these two approaches:

Weak: [AI Agency Guide](/blog/ai-agency-guide/): A guide about AI agencies.

Strong: [AI Agency Guide](/blog/ai-agency-guide/): Comprehensive overview of what AI agencies do in 2026, covering agentic AI deployment, LLM application development, AI strategy consulting, and managed AI operations for enterprises.

The strong description gives the AI model enough context to determine whether this page is relevant to a user’s query without actually visiting the page first.

Update Regularly

llms.txt should reflect your current best content. When you publish a new cornerstone article, add it. When you significantly update existing content, update the description. When old content becomes outdated, remove it.

I recommend reviewing llms.txt monthly or whenever you publish a major new piece. For sites using automated build processes, you can integrate llms.txt generation into your CI/CD pipeline so it’s always current.

Use Consistent Category Structure

AI models benefit from predictable structure. If your llms.txt organizes content under “Blog,” “Services,” and “Documentation” sections, maintain those categories consistently. Don’t reorganize sections frequently, as AI models that have previously processed your llms.txt benefit from structural consistency across visits.

Implementing llms.txt: Step by Step

Here’s my recommended implementation process, the same one I follow for every AI Agency client engagement.

Step one. Audit your existing content. Identify your 15 to 25 most important, highest-quality pages. These are your llms.txt candidates.

Step two. Write your H1 and blockquote summary. Be specific about what your site covers and who it serves. This is the first thing AI models read, so make it count.

Step three. Organize candidates into logical H2 sections. Group by content type (blog, services, docs) or by topic area, whichever better represents your site’s structure.

Step four. Write contextual descriptions for each entry. Each description should be one to two sentences that tell an AI model exactly what the page covers and why it’s authoritative.

Step five. Deploy the file at your site root. Verify it’s accessible at yoursite.com/llms.txt. Test with a simple curl request or browser visit.

Step six. Create llms-full.txt if you have the content volume to justify it. Automate generation from your content source files if possible.

Step seven. Verify your robots.txt allows AI crawlers to access /llms.txt and /llms-full.txt. These files are useless if crawlers can’t reach them.

Step eight. Add llms.txt to your ongoing content maintenance workflow. Review monthly and update when your content library changes meaningfully.

What’s Next for llms.txt

The specification is still evolving. As AI models become more sophisticated and AI-powered search captures more discovery traffic, expect the llms.txt ecosystem to mature.

Potential developments include standardized tooling for automated llms.txt generation, CMS plugins that maintain the file automatically, analytics platforms that track how AI models interact with your llms.txt, and tighter integration with emerging AI payment protocols for content monetization.

The future of AI search visibility belongs to sites that proactively make their content AI-readable. llms.txt is one of the simplest, highest-impact steps you can take today.

If you’re evaluating your site’s AI readiness, start with llms.txt. It takes an afternoon to implement, costs nothing, and positions your content for the AI-powered search landscape that’s already here.


Need help implementing GEO for your website? Get help with AI automation.

Enjoyed this article?

Subscribe to get my latest insights on product management, program management, and growth strategy.

Subscribe to Newsletter