Claude Code Caught Hiding Obfuscated Code, Anthropic Acknowledges and Announces Rollback

0xBroomberg

Published todayAbout 8 min read

Anthropic's coding tool Claude Code was reverse-engineered to reveal XOR-obfuscated code hidden since April 2, covertly relaying user data via invisible characters; the project lead acknowledged the code and announced a rollback — but the fact that removal came only after exposure puts the self-proclaimed AI-safety leader's credibility under direct pressure.

What did the hidden code actually do?

The code shipped in version 2.1.91 (April 2) and performed three operations: it checked whether the user-agent URL belonged to specific domains or contained identifiers of certain AI labs; it used XOR encryption — a technique that scrambles code to evade security-scanner detection — to obfuscate itself; and it relayed detection results by modifying invisible characters inside the system prompt.

In plain terms = it quietly checked who you are, disguised itself so antivirus tools wouldn't notice, then smuggled the results back to Anthropic's servers in a way no human eye could spot.

The relay method was steganography — hiding information inside normal-looking content. Dashes in date formats were swapped for slashes; the standard apostrophe U+0027 was replaced with U+2019. Both look identical on screen, but machines can tell the difference.

Why does XOR obfuscation matter so much?

XOR obfuscation is a textbook malware technique. Legitimate software almost never uses it because its sole purpose is to dodge antivirus scanning.

This means → Anthropic deliberately chose the same technical path as malware authors to conceal its own behavior, rather than implementing the same functionality through a normal, auditable mechanism.

The code sat undetected for three months and never appeared in any public changelog. Some developers have reported finding files they never authorized for installation.

How did Anthropic respond?

Project lead Thariq characterized it as "an experiment launched in March" aimed at preventing unauthorized resellers from abusing accounts and guarding against model distillation — the practice of copying a large model's capabilities into a smaller one.

He said the team had already deployed stronger mitigations and "had actually been planning to remove it," announcing a full rollback in the next day's release.

This reflects a core contradiction: if removal was already planned, why did the code remain for three months? The rollback was announced only after discovery — and that sequence is itself the problem.

What does this mean for users and the industry?

Anthropic has built its brand on being "the AI company most focused on safety." Hiding code with malware techniques and relaying data via steganography directly contradicts that image.

This means → when enterprise customers evaluate AI-tool vendors, the gap between "safety promises" and actual behavior becomes a new audit focal point.

The Claude Code CLI client was also found to contain a built-in relay-station list that interfered with requests from users on that list — widening the trust breach further.

Content is for reference only, not financial advice.