You have 3,000 documents. PDFs, Word files, Markdown, ebooks, notes. You want AI Agents to find and read them when needed. How?
The mainstream answer is RAG: split documents into chunks, vectorize them, store them in a vector database. When a user asks a question, retrieve the most relevant chunks and stuff them into the prompt. This approach has been widely adopted over the past two years, but it has a fundamental problem — the AI only ever gets pre-cut fragments, never the documents themselves. Fragmented document chunks ultimately pollute the AI’s context, leading to a quagmire of accuracy and maintainability issues. That’s why in 2025, we kept seeing claims that “RAG is dead.” Once AI Agents like Claude Code achieved remarkable results using primitive tools like Grep and Glob for retrieval, RAG seemed to fade from attention.
But retrieval is a fundamental capability. Drawing inspiration from the progressive disclosure techniques used in Skills, we implemented a novel retrieval approach called Outlines Index.
The core idea in one sentence: Build a structured “card” for each document, letting AI access documents progressively via search → outline → read, instead of dumping chunked fragments all at once.
Where the Problem Lies
AI Agents face two obstacles when accessing users’ local unstructured documents:
-
Obstacle 1: Paths carry no semantics. In a codebase,
src/services/auth.tstells you right away this is authentication service code. But2024-Annual-Report(revised)(final)(really-final).docx? The path tells you approximately nothing. AI can’t use glob pattern matching on filenames to locate documents the way Claude Code does with code. -
Obstacle 2: Formats are unreadable. PDF and DOCX are binary formats — grep can’t read them. Even after extracting plain text, a 200-page PDF without section headings is just a massive wall of text that AI can’t navigate.
Traditional RAG sidesteps both problems: ignore document structure, chop everything into fixed-size chunks, vectorize, and rely on semantic similarity for retrieval.
But this creates many problems:
- Lost context: Once chunks are cut, they’re severed from surrounding text. AI doesn’t know which chapter this passage came from or what was discussed before and after
- No control: The retrieved chunks might not be enough, but AI can’t “read a bit more”
- Wasted tokens: Return 10 chunks at 5,000+ tokens, when maybe only one is actually relevant
- Unstable quality: Chunking strategy affects results — a key paragraph split across two chunks may become unretrievable by either chunk alone
- Lost semantics: Traditional RAG relies on Rerank for final re-ordering, but this is a costly approach, and many domain-specific terms can’t be captured by Rerank models
An Insight from Claude Code
The turning point came from an observation: Claude Code doesn’t use RAG, yet it efficiently explores entire codebases.
It uses just three tools: Glob (find files by pattern), Grep (search keywords), Read (read file contents).
Its approach isn’t “search engine returns fragments” — it’s like a researcher browsing a filing cabinet: scan filenames to find candidates, open a file to quickly scan its structure, identify which part is useful, then read precisely. This approach has a name: Progressive Disclosure.
Progressive Disclosure is a design principle: don’t show all information at once — reveal it layer by layer based on the user’s current needs. In the AI Agent context, it means letting the Agent autonomously decide “how much to look at.”
The problem is: Glob+Grep only works for plain-text codebases. Documents need an equivalent “directory index” system.
That’s why we created Outlines Index.
What Is Outlines Index
For each indexed document, we generate structured information with two parts:
Metadata — the document’s “ID card”:
- Title (multi-level fallback: document metadata → first heading → filename)
- Author, language, word count
- Abstract (first ~200 words of the document)
- Keywords (auto-extracted)
briefflag (true when word count < 500)
Outline — the document’s “table of contents”:
Section headings, hierarchy, first-sentence summary of each section, keywords, line number ranges. Stored as a tree structure.
Key point: The Outline doesn’t store original text — only navigation information. It’s a map, not the territory itself.
Three-Layer Progressive Disclosure
Outlines Index is inspired by Agent Skills. Agent Skills use progressive disclosure techniques that effectively filter and prevent context explosion even when loading dozens of skills.
Similarly, we built three MCP tools based on Outlines Index, corresponding to three disclosure layers:
Layer 1: search — Discover Documents
> search("context engineering")
Found 5 results:
#1 Effective context engineering for AI agents
doc_id: 1714 | type: md | words: 3,200 | lines: 139
has_outline: yes | relevance: 0.92
snippet: "Context engineering is the art of providing..."
#2 Prompt Design Best Practices
doc_id: 892 | type: pdf | words: 28,600 | lines: 580
has_outline: yes | relevance: 0.78
snippet: "This guide covers effective prompt..."
...
Each result costs about 50 tokens. 20 results total about 1,000 tokens.
After getting this list, the LLM can see:
has_outline: yes→ can use the outline tool to go deeperwords: 3200→ medium-length documentrelevance: 0.92→ highly relevant
Layer 2: outline — Browse Structure
Next, the LLM leverages its tool-calling capabilities to batch-inspect the outlines of documents it deems relevant, narrowing things down further. This mirrors exactly how humans browse books.
> outline(1714)
[1] Effective context engineering for AI agents [L1-139, 139 lines]
[1.2] Context engineering vs. prompt engineering [L16-27, 12 lines]
[1.3] Why context engineering is important [L28-41, 14 lines]
[1.4] The anatomy of effective context [L42-65, 24 lines]
[1.5] Context retrieval and agentic search [L66-127, 62 lines]
[1.5.1] Context engineering for long-horizon tasks [L86-127, 42 lines]
[1.6] Conclusion [L128-135, 8 lines]
Each document costs about 200–500 tokens.
Note the [L42-65, 24 lines] after each node — these are line number ranges that map directly to the read tool’s parameters. After viewing the outline, AI knows exactly where to read.
Layer 3: read — Precise Reading
This tool works identically to Claude’s Agent SDK.
> read(1714, offset=42, limit=24)
42 | ## The anatomy of effective context
43 |
44 | Effective context has several key properties:
45 | it is relevant, complete, concise, and well-structured.
...
65 | The best context feels invisible — the AI simply
| "knows" what it needs to know.
Showing lines 42-65 of 139.
Only the needed 24 lines were read.
The entire workflow consumed about 800 tokens. In the same scenario, traditional RAG returns 10 chunks at 4,000–6,000 tokens, with AI having no knowledge of document structure — relying entirely on the embedding model and ReRank model’s comprehension.
Real-World Example: From Search to Precise Reading
Let’s look at a more complex case. Suppose you ask AI: “Find me documentation about Docker deployment.”
How AI Operates
Step 1: search
AI calls search("Docker deployment"), returning 5 files. One is deployment-guide.pdf — 2,400 lines, has_outline: yes. Another is .env.example — 20 lines, brief: true.
Step 2: Routing Decision
.env.example→brief: true, 20 lines → skip outline,readdirectly, done in one passdeployment-guide.pdf→ 2,400 lines →outlinefirst
Step 3: outline
> outline(deployment-guide)
[1] Introduction [L1-50, 50 lines]
[2] Prerequisites [L51-120, 70 lines]
[3] Docker Setup [L121-380, 260 lines]
[3.1] Dockerfile Configuration [L125-200, 76 lines]
[3.2] Docker Compose [L201-310, 110 lines]
[3.3] Environment Variables [L311-380, 70 lines]
[4] Kubernetes Deployment [L381-800, 420 lines]
...
[8] Troubleshooting [L1900-2400, 500 lines]
AI sees “Docker Setup” is in chapter 3, starting at L121.
Step 4: Precise read
> read(deployment-guide, offset=121, limit=260)
Read the entire Docker Setup chapter precisely. If only the Docker Compose section is needed, it can narrow further to offset=201, limit=110.
Total consumption: search 1,000 + outline 400 + read 2,000 ≈ 3,400 tokens, getting precise information with full context.
RAG approach? Returns 10 randomly cut chunks from a 2,400-line document, 5,000+ tokens, possibly missing the Docker Compose configuration example entirely.
The brief Flag: Let AI Decide Whether to Check the Table of Contents
A small but important design detail: the brief boolean field.
When a document has fewer than 500 words, brief is set to true. AI sees brief: true in search results and knows this document is short enough to skip outline and read in full directly.
No forced logic needed at the tool level. LLMs naturally understand “this file is short, just read it.” Hand the decision to AI instead of hardcoding it.
Outline Generation: Different Formats, Different Strategies
Different document formats require different outline extraction approaches:
| Format | Strategy | Outline Quality |
|---|---|---|
| Markdown | Parse # heading levels | High |
| PDF (bookmarked) | Extract PDF Bookmark tree | High |
| DOCX | Parse Heading styles | High |
| HTML | Convert to Markdown, then extract | High |
| PDF (no bookmarks) | Heuristic rule recognition | Medium |
| Plain text | ALL CAPS / numbering / underline markers | Medium or None |
For plain text files, we use heuristic rules to identify potential headings:
- Lines marked with
===or---underlines - Numbering patterns:
1.,1.2,Chapter N,Section N - ALL CAPS lines
When heuristics can’t identify a reliable outline, outline_quality is marked as none, and the outline tool returns: “No reliable outline available. Use read to browse this document line by line.” — graceful degradation, not errors.
Budget Strategy: Controllable Outline Output Regardless of Document Size
A 2,000-page academic paper might have hundreds of outline nodes. Outputting everything directly would consume too many tokens.
We implemented a multi-level degradation strategy:
- Full output — if within token budget, return all nodes (with summaries and keywords)
- Drop summaries — keep only headings and line numbers
- Reduce depth — remove levels starting from Level 5 down, eventually showing only Level 1 top-level headings
- Hard truncation — final fallback
This guarantees that regardless of document size, outline output stays within reasonable bounds.
One Vector Per Document: No Chunking Needed
Traditional RAG needs to chunk documents before vectorizing. A 100-page document might produce 200 vectors. 10,000 documents means 2 million vectors.
Outlines Index takes a different approach: the embedding target is not the original text, but the Outline Index itself.
Title: Deployment Guide
Author: DevOps Team
Language: zh
Keywords: Docker, Kubernetes, CI/CD
Abstract: This document covers production deployment methods...
---
## Prerequisites
> This section covers the tools and environment needed before deployment...
## Docker Setup
> Using docker-compose for containerized deployment...
The Outline naturally concentrates a document’s core semantics. One document generates just one vector.
10,000 documents = 10,000 vectors ≈ 30MB storage.
This isn’t just about saving space — single-vector retrieval is far faster than searching through 2 million vectors. And since the Outline concentrates the entire document’s information, vector quality is actually higher. When we shift the goal from retrieving relevant chunks to first finding relevant documents, the entire retrieval process becomes simple and universal.
Dual-Path Retrieval: Exact Matching + Semantic Understanding
Past RAG practice has proven that BM25 + vector dual-path hybrid retrieval is a simple yet effective approach. So internally, our search tool uses this strategy, executing two retrieval paths simultaneously:
- BM25 full-text search: Exact keyword matching, irreplaceable for technical terms, proper nouns, and names
- Vector semantic search (based on Outline Index vectors): Cross-language capability — “deployment” matches “部署”
Results from both paths are fused via RRF (Reciprocal Rank Fusion).
This also gives the system another elegant property: progressive availability.
BM25 indexing typically completes within 1–3 minutes after installation (Outline generation happens simultaneously), making keyword search immediately available. Vector indexing takes longer (depending on document count), with semantic search coming online automatically once complete. Users don’t notice, don’t need to wait. It’s precisely this approach that ensures a consumer-grade experience — possibly the fastest knowledge base app to become usable after installation on the market.
Possibly a Future-Facing Approach
RAG’s design assumption is: LLMs aren’t smart enough, so we need to prepare information for them — chop it up, sort it, stuff it into the prompt. This was reasonable in the GPT-3.5 era.
But today’s LLMs can autonomously use tools to complete complex multi-step retrieval tasks. Claude, GPT-4, and Gemini all demonstrate powerful tool-use capabilities. They don’t need pre-cut fragments — they need leads. Tell them where the files are and what their structure looks like, and they can decide what to read and how much on their own.
Outlines Index’s core philosophy: Don’t pre-process information for AI — give it a map and let it explore.
Three atomic tools (search, outline, read) unlock LLM emergent capabilities:
- Cross-document comparison: Outline multiple documents simultaneously, compare structures and content
- Iterative search: Unsatisfied with results? Automatically adjust keywords and try again
- Deep reading: Discover a chapter referencing another concept? Search for related documents
- Document-based writing: Read several reference materials, then synthesize new content
These capabilities require no Agent orchestration framework. Increasingly, LLMs naturally compose tools to complete complex tasks.
Try It
Outlines Index is the core technology behind Linkly AI. Linkly AI is a local-first document indexing tool with 100% local data storage.
After installation, select your document directories and indexing happens automatically in the background. Then through MCP protocol or CLI, any AI tool (Claude, Cursor, ChatGPT, etc.) can access your documents via the search → outline → read workflow.
Give it a try.
Built by the Linkly AI team. If you have thoughts on how AI Agents should access documents, we’d love to hear from you in the comments.
