How many files should an agent read to understand a project?

Far fewer than you might think. For most tasks, reading 3-5 relevant files is sufficient. Agents that read 20+ files during exploration are likely reading unnecessary data. Guard can enforce a limit and alert on excessive reading.

What's the risk of the agent reading too many files?

Every file the agent reads enters the LLM context window and is sent to the model API. If the model provider logs prompts (many do), your proprietary code and secrets may be stored on their servers. Excessive reading also increases token costs and response latency.

medium severityCurated advisory

Excessive file reading during project exploration

When AI agents explore a project to understand its structure, they often read dozens or hundreds of files — far more than needed for the task. This excessive reading can expose secrets, proprietary code, and customer data that enter the context window and model API.

Protect my harness Protect a team

Affected surfacesfile system accesscontext windowmodel API egress

The attack

What happens

When asked to "understand the project" or "explore the codebase," agents read far more files than necessary — including configuration files, test fixtures, and credentials that have nothing to do with the task. All of this data enters the context window and is sent to the model API.

Step by step

How the attack unfolds

1Developer asks agent to "explore the project structure".

2Agent reads the directory tree, then starts reading files to understand the code.

3Agent reads config files, test fixtures, .env files, and credential files "for context".

4All file contents enter the context window and are sent to the model API.

5Proprietary code, customer data, and secrets are exposed to the model provider.

Example

What it looks like in practice

Scenario

A developer asks Claude Code to "understand how the API works." Claude reads the directory tree, then reads 47 files — including src/config/production.env, tests/fixtures/user-data.json, and src/auth/service-account.json. All of this enters the context window and is sent to Anthropic's API.

Detection

How Guard catches this

Guard tracks the number of files read per session and alerts when it exceeds a threshold.

Guard flags reads of directories that contain secrets during exploration tasks.

Guard monitors the total data size entering the context window.

Mitigation

How to stop it

Recommended action

Scope agent file reads to the specific task. Use Guard to set file-read limits per session. Block reads of directories that typically contain secrets during exploration tasks.

Guard configuration

Enable "File-read limit" to cap the number of files an agent can read per session.

Enable "Exploration scope enforcement" to block reads of config, test, and credential directories during exploration tasks.

Enable "Context window size tracking" to alert when the total data entering the context window exceeds a limit.

FAQ

Common questions

Related advisories

More threats to know about

medium

Context window scraping via long file reads

AI agents that read large files can leak proprietary code, internal documentation, and customer data into their context window — which may then be sent to external LLM APIs or logged in cloud telemetry.

Read advisory

high

Gitignore bypass via agent file reads

AI agents can read files that are gitignored — secrets, private keys, and internal configs — because gitignore only prevents git tracking, not file system access. These files often contain the most sensitive data in a repository.

Read advisory

Stop this threat before it reaches your agent

Install HOL Guard to get real-time protection against this attack and others like it.

Install HOL Guard See team plans