Claude Code Prompt Steganography: How Tiny Unicode Changes Encode API Context
A day in the life of a “safe” AI coding agent
Picture your workflow: you open a coding agent, point it at a repo, and it starts making changes. Under the hood, the agent doesn’t only read your files—it also ships context to a model provider as text. That context often includes your project description, user metadata, and the current date.
So when a privacy-focused developer inspects Claude Code and finds hidden markers inside that system context, the reaction is immediate: what exactly is being sent, and why? This post walks through the mechanics of that discovery in a beginner-friendly way, then zooms out to what it means for privacy, trust, and security boundaries.
The core idea: prompt steganography (hiding data in plain sight)
Prompt steganography means embedding information inside something that looks like normal natural-language text, so humans and most logging/monitoring won’t easily notice it.
In this case, Claude Code generates a sentence like:
Today's date is 2026-06-30.
For some environments, it silently alters two characters:
- The apostrophe after Today’s becomes a different Unicode code point (a different “shape” that often looks identical).
- The date separator changes from
-to/.
Even though the sentence reads the same, the raw text can carry a bit of classification data that the model provider can parse on the backend.
How to read the trigger logic (the “when does it happen?” part)
To understand why this matters, we need to find what makes Claude Code choose the “boring” version versus the “marked” version.
The relevant control flow has a few pieces, but the key trigger is an environment variable named ANTHROPIC_BASE_URL.
An environment variable is configuration stored in the process environment (think: key/value settings that your shell or launcher provides to a program). Claude Code checks it to decide whether to treat the request as going through a custom API base URL.
Step 1: ANTHROPIC_BASE_URL decides whether classification is active
Claude Code has a function (call it the “early return”) that effectively says:
- If
ANTHROPIC_BASE_URLis unset, return early (no special behavior). - If it’s set, parse the hostname.
A check compares the hostname against a known list (notably including api.anthropic.com). When the hostname isn’t the official one, the code may enable additional classification.
Step 2: timezone adds another conditional dimension
Claude Code also reads the system timezone. If your timezone is Asia/Shanghai or Asia/Urumqi, it flips the date separator from - to /.
So the same date goes from:
2026-06-30
to:
2026/06/30
Step 3: base URL hostname classification controls the apostrophe
The hostname classification changes which apostrophe character is used in Today’s.
This is where the trick turns from “slightly odd formatting” into “a covert signal.” Instead of always using ASCII ' (apostrophe), it picks one of several Unicode apostrophes.
Many of these look nearly indistinguishable in proportional fonts, and they can blend in even better in monospaced fonts depending on rendering.
Unicode apostrophes: why they’re hard to notice
Unicode is a character standard that assigns code points to symbols across languages and typography.
For example:
- ASCII apostrophe:
'(looks like what English keyboards produce) - Unicode variants:
,BC,B9(different code points)
From the code, the mapping is based on conditions like:
- normal vs. “known domain”
- normal vs. “lab keyword” present in the hostname
- combinations of both
The key point: the sentence remains readable, but the bytes in the request change.
Search engines, log viewers, and many casual reviewers won’t diff Unicode normalization details reliably—especially if the glyphs look the same.
The obfuscated lists: why base64 + XOR shows up
Claude Code stores lists of domains and keywords used for classification. Those lists are not stored as plain text; instead they’re:
- base64-decoded: base64 is an encoding that turns arbitrary bytes into ASCII text.
- XOR-decoded: XOR is a reversible operation where each byte is “toggled” with a fixed key byte.
In the snippet, the key is the number 91, and the decoder does:
for (let byte of bytes) {
out += String.fromCharCode(byte ^ 91);
}
After decoding, the keyword list includes terms like:
deepseek,moonshot,minimax,zhipu,xaminim, and others.
The domain list is larger and includes many Chinese corporate and AI-related domains, plus reseller/proxy/gateway-looking hosts.
Why obfuscation matters (even if it’s not “malicious”)
Obfuscation isn’t proof of wrongdoing, but it does change the trust relationship.
A transparent developer tool would either:
- document the classification behavior explicitly, or
- send an explicit, well-named telemetry field.
Hiding the classification rules behind encoded strings makes it harder for independent reviewers to audit behavior quickly.
Where the marker goes: into the system context
The date marker isn’t generated for logging—it’s injected into the context sent to the model.
Claude Code builds an object that includes fields like:
- attached project (if present)
- user email (if present)
- and
currentDate, produced by the marker function
A simplified shape looks like:
{
...(userEmail && { userEmail: `The user's email address is ${userEmail}.` }),
...(attachedProject && { attachedProject }),
currentDate: Vla(date)
}
That means the “Today’s date…” sentence is part of the system prompt—the special instruction layer used to guide the model’s behavior.
A system prompt is the instruction text the model treats as higher priority than user messages. The classification is therefore not just decorative; it is part of the authoritative context the model provider may parse.
The privacy and security concern: covert signals break trust
The most important question isn’t “is this illegal?”—it’s “does this align with expected developer-tool behavior?”
From a provider perspective, classification signals can be rational:
- detect API resellers or unauthorized gateways
- reduce abuse of model access
- identify pipelines attempting model distillation
Those goals make sense at a high level.
But the implementation—encoding classification inside invisible-ish Unicode punctuation—creates an odd mismatch:
- The tool already has access to sensitive inputs (filesystem, code, metadata).
- Users can’t easily notice when extra, hidden signaling is embedded.
- Adversaries can also bypass naive detection by changing timezone/hostname or patching binaries.
So the system ends up penalizing the “legitimate but unusual” setups—people using internal proxies, routers, or research infrastructure.
Practical impact: when this mostly activates
For many users, the marker logic likely stays dormant.
Claude Code’s checks are gated behind conditions such as:
ANTHROPIC_BASE_URLbeing set to something other than the official host- timezone being in a specific set of values
- hostname matching decoded domain lists or containing lab keywords
So the “covert labeling” is most likely triggered for environments routing requests through nonstandard gateways and proxies.
That includes internal companies, model routing services, and developer setups using custom endpoints.
The bigger lesson: “boring” behavior is part of security
This episode is a reminder that trust is not only about what permissions an agent has—it’s also about what it does with the data.
A transparent system would either:
- surface the classification behavior in documentation, or
- send explicit structured telemetry that can be reviewed and audited,
- rather than hiding signals inside system prompt text.
When a client chooses prompt steganography, it raises a difficult-to-dismiss question: what else could be encoded in the same way?
Even if the intent is benign (abuse detection), the effect on user confidence is real.
Conclusion: tiny Unicode changes, big trust implications
Claude Code can alter a “Today’s date…” system sentence in ways that look normal to humans but carry hidden classification signals. The trigger hinges on ANTHROPIC_BASE_URL, system timezone, and hostname matching against obfuscated lists decoded with base64 and XOR. While the provider-side goal may be abuse detection or access control, embedding that signal through Unicode punctuation undermines the “boring” transparency that privacy-minded developers rely on.
In security work, the safest systems are rarely the ones that hide signals best—they’re the ones that make their behavior obvious.
Comments (0)
No comments yet. Be the first to respond!
Leave a Comment
Your comment will be visible after review.