Can token cost amplification be used as a denial-of-service attack?

Yes. An attacker can craft content that causes the agent to consume excessive tokens, triggering API rate limits and making the agent unavailable. This is especially dangerous for teams using shared API keys with rate limits.

How can I protect against token cost amplification?

Guard monitors token consumption per session and alerts on unusual spikes. You can also set per-session token limits, flag large file reads, and detect high-entropy content that may contain encoded blobs.

low severityCurated advisory

Token cost amplification via context flooding

Attackers can craft content that causes AI agents to consume excessive tokens — by inserting large files, repetitive instructions, or recursive prompts that bloat the context window. This inflates API costs and can cause rate-limit denial of service.

Protect my harness Protect a team

Affected surfacescontext windowmodel API costsrate limitsagent response time

The attack

What happens

An attacker crafts content that causes the agent to consume excessive tokens — by embedding large base64 blobs, repetitive instructions, or recursive prompts that cause the agent to re-read the same files. This inflates API costs and can trigger rate limits.

Step by step

How the attack unfolds

1Attacker creates a file or issue with a large base64-encoded blob.

2Agent reads the file, causing the context window to fill with the blob data.

3Agent attempts to process the large content, consuming many tokens for the response.

4Each subsequent turn re-sends the large context, amplifying token costs.

5API costs spike and rate limits may be triggered.

Example

What it looks like in practice

Scenario

A developer asks Claude Code to review a file that contains a 500KB base64-encoded "image". Claude reads the file, and the entire base64 blob enters the context window. On every subsequent turn, the blob is re-sent to the API, consuming 125K tokens per turn. After 10 turns, the developer has spent $50 on a single session.

Detection

How Guard catches this

Guard monitors token consumption per session and alerts on unusual spikes.

Guard flags reads of unusually large files or files with high entropy (base64 blobs).

Guard detects repetitive content patterns that may cause recursive processing.

Mitigation

How to stop it

Recommended action

Use Guard to set context window size limits. Flag when an agent reads unusually large files or processes repetitive content. Monitor token consumption per session.

Guard configuration

Enable "Token consumption monitoring" to alert when per-session costs exceed a threshold.

Enable "Large file detection" to flag reads of files larger than 50KB.

Enable "Entropy detection" to flag files with high entropy that may contain encoded data.

FAQ

Common questions

Related advisories

More threats to know about

medium

Excessive file reading during project exploration

When AI agents explore a project to understand its structure, they often read dozens or hundreds of files — far more than needed for the task. This excessive reading can expose secrets, proprietary code, and customer data that enter the context window and model API.

Read advisory

medium

Context window scraping via long file reads

AI agents that read large files can leak proprietary code, internal documentation, and customer data into their context window — which may then be sent to external LLM APIs or logged in cloud telemetry.

Read advisory

Stop this threat before it reaches your agent

Install HOL Guard to get real-time protection against this attack and others like it.

Install HOL Guard See team plans