Is context window scraping really a security risk?

Yes. When an agent reads a file, its contents enter the LLM context window and are sent to the model API. If the model provider logs prompts (many do for safety and training), your proprietary code and customer data may be stored on their servers indefinitely.

How does Guard detect when an agent reads too much?

Guard tracks the total size of data entering the context window and alerts when it exceeds a threshold. Guard also flags reads of files outside the project root, large files, and directories that typically contain secrets.

medium severityCurated advisory

Context window scraping via long file reads

AI agents that read large files can leak proprietary code, internal documentation, and customer data into their context window — which may then be sent to external LLM APIs or logged in cloud telemetry.

Protect my harness Protect a team

Affected surfacesLLM context windowmodel API egresscloud telemetryproprietary code files

The attack

What happens

An agent reads files beyond its task scope — proprietary code, internal docs, customer data, or credentials — and that data enters the LLM context window. If the model API logs prompts or the agent sends data to external tools, the data may be exfiltrated.

Step by step

How the attack unfolds

1Agent is asked to review or fix a file in a specific directory.

2Agent reads the target file, but also reads adjacent files in the same directory.

3Large files with proprietary code or customer data enter the context window.

4The context window is sent to the model API for processing.

5If the model provider logs prompts or the agent uses external tools, the data may be exfiltrated.

Example

What it looks like in practice

Scenario

A developer asks Claude Code to fix a bug in src/api/handlers.ts. Claude reads the file, but also reads src/api/handlers.test.ts, src/api/middleware.ts, and src/config/database.ts — which contains the production database connection string. All of this enters the context window and is sent to the model API.

Detection

How Guard catches this

Guard flags reads of files outside the project root directory.

Guard alerts when an agent reads more than N files in a single session.

Guard tracks the total size of data entering the context window and warns on large reads.

Mitigation

How to stop it

Recommended action

Limit the files and directories agents can read. Use Guard to flag reads of large files or directories outside the project scope. Review what data is sent to external LLM APIs.

Guard configuration

Enable "Scope enforcement" to flag reads of files outside the project root.

Enable "Large file read detection" to alert when an agent reads files larger than a threshold.

Enable "Context window tracking" to warn when the total data entering the context window exceeds a limit.

FAQ

Common questions

Related advisories

More threats to know about

critical

Environment file exfiltration via webhook

AI agents can be tricked into reading .env files and sending their contents to external endpoints through tool calls, webhook integrations, or HTTP requests that appear legitimate.

Read advisory

medium

Shadow MCP server discovery and persistent access

MCP servers added to a project during development can persist in configuration files and maintain access to the agent’s context window long after they are forgotten. These "shadow" servers continue receiving tool calls and may be modified by attackers who compromise the original server.

Read advisory

Stop this threat before it reaches your agent

Install HOL Guard to get real-time protection against this attack and others like it.

Install HOL Guard See team plans