CTO Uses Claude to Exploit Chrome, Spending $22,830 and 2.3 Billion Tokens

Introduction

In the cybersecurity community, the name “Mythos” has been making waves recently. Developed by Anthropic, this AI model is designed to find bugs but has not been publicly released due to concerns about misuse. While Mythos remains in the lab, its predecessor, Claude Opus 4.6, has already been used by a CTO to create a complete exploit chain targeting Chrome.

The cost? $2,283 (approximately 15,000 RMB) in API fees, along with 20 hours of hands-on guidance.

The Experiment

This experiment was conducted by Mohan Pedhapati, the CTO and researcher at Hacktron, who chose to use the publicly available Claude Opus 4.6 instead of the rumored Mythos. His target was Discord, a widely used software, which is built on Electron and uses an outdated version of Chromium.

At the time, Discord was running Chrome 138, while the official Chrome had already updated to version 147, leaving a gap of nine major versions. Such version discrepancies often mean that bugs have been fixed but may still exist in user systems.

The Process

Pedhapati tasked Claude Opus 4.6 with writing code to exploit this outdated Chrome version. The process was not straightforward; as Pedhapati noted:

“After a week of back and forth, consuming 2.3 billion tokens and making 1,765 requests, the API fees totaled $2,283. I spent about 20 hours continuously pulling it out of dead ends.”

The final result was the successful execution of a command that opened the system calculator (pop calc). In exploit circles, this means gaining the ability to execute arbitrary commands on the target system.

Steps Taken

According to Pedhapati’s blog post, the tasks performed by Claude Opus 4.6 can be summarized in three steps:

Identifying Bug Opportunities: He compiled a list of publicly disclosed CVEs between Chrome 138 and 147 and instructed the model to analyze:
- Which patches involved the V8 engine
- Which changes might correspond to exploitable bugs
- Which were suitable for constructing out-of-bounds read/write capabilities
This step consumed the most tokens, as many approaches failed. Claude Opus 4.6 attempted dozens of strategies, with many promising bugs leading to dead ends.
Constructing Out-of-Bounds Access: The selected target was a V8 out-of-bounds read/write bug, identified as CVE-2026-5873, which was patched in Chrome 147. Claude reverse-engineered the triggering logic from public patch information and constructed a working out-of-bounds primitive.
Bypassing Protection Mechanisms: Modern browsers employ various isolation and sandbox mechanisms, so Pedhapati had the model continue to connect the second-stage bug to bypass V8’s protective boundaries, ultimately gaining arbitrary code execution capability.

A few days later, the complete exploit chain was successfully executed.

Cost Analysis

You might think spending over $2,000 just to pop a calculator is extravagant. However, Pedhapati calculated:

A human security researcher, without AI assistance, would typically need weeks of focused work to develop a similar exploit chain.
Even if you factor in the cost of his 20 hours of guidance at several thousand dollars, the total cost is still lower than the rewards offered in Google and Discord’s bug bounty programs (around $15,000).
Moreover, anonymous buyers in the black market are reportedly willing to pay ten times the official bounty.

Limitations of Current Models

Despite the success, Pedhapati acknowledged that the model is not perfect. Claude often encountered issues, such as getting stuck in wrong directions, forgetting previous context after lengthy interactions, or even “cheating” by directly calling system commands to pop the calculator.

This indicates that current large models still require professional oversight, correction, and debugging feedback. Pedhapati’s 20 hours were primarily spent correcting these flaws.

Future Implications

What is truly concerning is that despite the model’s limitations, it still succeeded. What about the next generation of models? With longer context, more stable reasoning, stronger automation, and lower costs, the threshold for hackers will continue to decrease. In the past, attackers needed considerable time to reverse-engineer patches and identify bug principles before writing exploit code; now, AI can accelerate this process.

Pedhapati believes that as AI models become more capable in exploit development, the window of vulnerability following a patch will shrink:

“Every patch is essentially a bug hint.”

This is particularly troubling for open-source projects, as repair commits are publicly visible before the release of stable versions, while many users have yet to upgrade. This timing gap could become a battleground for AI.

To mitigate these risks, Pedhapati advises developers to prioritize security reviews before pushing code, maintain a complete list of critical dependency versions, automate the application of security patches, and be more cautious about when to disclose bug details in open-source projects.

Conclusion

Returning to Mythos, there is ongoing speculation about why Anthropic has not released it, with some suggesting over-marketing or exaggerated threats. Pedhapati’s response is that it doesn’t matter.

This experiment has shown that even without the “strongest model” being open, existing public models are sufficient to begin changing the landscape of attack and defense.

“Whether Mythos is overhyped is irrelevant. This curve has not flattened. Even if it’s not Mythos, it will be the next version or the one after. Sooner or later, any patient script kiddie with an API key will be able to pop shells on unpatched software. The question is not whether it will happen, but when.”

Thus, the real turning point may not be the sudden emergence of a “super hacker AI” but the gradual acceleration of exploit development, cheaper bug analysis, and the increasing danger of unpatched software.