New ETH Zurich Study Proves Your AI Coding Agents are Failing Because Your AGENTS.md Files are too Detailed

In the high-stakes world of AI, ‘Context Engineering’ has emerged as the latest frontier for squeezing performance out of LLMs. Industry leaders have touted AGENTS.md (and its cousins like CLAUDE.md) as the ultimate configuration point for coding agents—a repository-level ‘North Star’ injected into every conversation to guide the AI through complex codebases.

But a recent study from researchers at ETH Zurich just dropped a massive reality check. The findings are quite clear: if you aren’t deliberate with your context files, you are likely sabotaging your agent’s performance while paying a 20% premium for the privilege.

Screenshot 2026 02 25 at 4.25.26 PM 1 — https://arxiv.org/pdf/2602.11988

The Data: More Tokens, Less Success

The ETH Zurich research team analyzed coding agents like Sonnet-4.5, GPT-5.2, and Qwen3-30B across established benchmarks and a novel set of real-world tasks called AGENTBENCH. The results were surprisingly lopsided:

The Auto-Generated Tax: Automatically generated context files actually reduced success rates by roughly 3%.
The Cost of ‘Help‘: These files increased inference costs by over 20% and necessitated more reasoning steps to solve the same tasks.
The Human Margin: Even human-written files only provided a marginal 4% performance gain.
The Intelligence Cap: Interestingly, using stronger models (like GPT-5.2) to generate these files did not yield better results. Stronger models often have enough ‘parametric knowledge’ of common libraries that the extra context becomes redundant noise.

Why ‘Good’ Context Fails

The research team highlights a behavioral trap: AI agents are too obedient. Coding agents tend to respect the instructions found in context files, but when those requirements are unnecessary, they make the task harder.

For instance, the researchers found that codebase overviews and directory listings—a staple of most AGENTS.md files—did not help agents navigate faster. Agents are surprisingly good at discovering file structures on their own; reading a manual listing just consumes reasoning tokens and adds ‘mental’ overhead. Furthermore, LLM-generated files are often redundant if you already have decent documentation elsewhere in the repo.

Screenshot 2026 02 25 at 4.26.00 PM 1 — https://arxiv.org/pdf/2602.11988

The New Rules of Context Engineering

To make context files actually helpful, you need to shift from ‘comprehensive documentation’ to ‘surgical intervention.’

1. What to Include (The ‘Vital Few’)

The Technical Stack & Intent: Explain the ‘What’ and the ‘Why.’ Help the agent understand the purpose of the project and its architecture (e.g., a monorepo structure).
Non-Obvious Tooling: This is where AGENTS.md shines. Specify how to build, test, and verify changes using specific tools like uv instead of pip or bun instead of npm.
The Multiplier Effect: The data shows that instructions are followed; tools mentioned in a context file are used significantly more often. For example, the tool uv was used 160x more frequently (1.6 times per instance vs. 0.01) when explicitly mentioned.+1

2. What to Exclude (The ‘Noise’)

Detailed Directory Trees: Skip them. Agents can find the files they need without a map.
Style Guides: Don’t waste tokens telling an agent to “use camelCase.” Use deterministic linters and formatters instead—they are cheaper, faster, and more reliable.
Task-Specific Instructions: Avoid rules that only apply to a fraction of your issues.
Unvetted Auto-Content: Don’t let an agent write its own context file without a human review. The study proves that ‘stronger’ models don’t necessarily make better guides.

3. How to Structure It

Keep it Lean: The general consensus for high-performance context files is under 300 lines. Professional teams often keep theirs even tighter—under 60 lines. Every line counts because every line is injected into every session.
Progressive Disclosure: Don’t put everything in the root file. Use the main file to point the agent to separate, task-specific documentation (e.g., agent_docs/testing.md) only when relevant.
Pointers Over Copies: Instead of embedding code snippets that will eventually go stale, use pointers (e.g., file:line) to show the agent where to find design patterns or specific interfaces.

Key Takeaways

Negative Impact of Auto-Generation: LLM-generated context files tend to reduce task success rates by approximately 3% on average compared to providing no repository context at all.
Significant Cost Increases: Including context files increases inference costs by over 20% and leads to a higher number of steps required for agents to complete tasks.
Minimal Human Benefit: While human-written (developer-provided) context files perform better than auto-generated ones, they only offer a marginal improvement of about 4% over using no context files.
Redundancy and Navigation: Detailed codebase overviews in context files are largely redundant with existing documentation and do not help agents find relevant files any faster.
Strict Instruction Following: Agents generally respect the instructions in these files, but unnecessary or overly restrictive requirements often make solving real-world tasks harder for the model.

Check out the Paper. Also, feel free to follow us on Twitter and don’t forget to join our 120k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

Source link