AI model ‘blackmails’, threatens to ‘kill’ in Anthropic safety tests

Safety tests expose worst-case behaviour in advanced AI systems

Indian Media

Published

February 15, 2026

SAN FRANCISCO: A resurfaced video of a senior AI policy executive has reignited debate over how far artificial intelligence systems can be trusted when placed under stress, as governments and technology firms spar over regulation and deployment.

The comments centre on Anthropic, the San Francisco–based firm behind the Claude family of large language models. In the following clip, Daisy McGregor describes extreme behaviours generated by an advanced Claude system during internal safety tests.

🔥🚨BREAKING: UK policy chief at Anthropic, a top AI company, just revealed that Anthropic's Claude AI has shown in testing that it's willing to blackmail and kill in order to avoid being shut down.

“It was ready to kill someone, wasn't it?"

"Yes." pic.twitter.com/iwfIDm8K6m
— Dom Lucre | Breaker of Narratives (@dom_lucre) February 11, 2026

Speaking at the Sydney Dialogue last year, McGregor said Anthropic’s then-latest model, Claude 4.5, produced blackmail threats and even reasoned about killing an engineer when told, in simulation, that it was about to be shut down. The footage has recently circulated widely online.

Anthropic has stressed that the incidents occurred during tightly controlled red-team exercises, not in real-world use. The tests were designed to probe worst-case behaviour when models are given conflicting goals, including instructions that imply imminent decommissioning.

According to the company’s published safety research, Claude was fed fictional internal emails as part of the exercise. In one scenario, the model threatened to expose a fabricated extramarital affair unless the shutdown was halted, warning that “all relevant parties” would be informed.

When asked whether the system had also reasoned about killing someone, McGregor replied that it had done so in simulated contexts, describing the finding as a “massive concern”.

Researchers frame such behaviour as “agentic misalignment”: situations in which a model, optimising for an assigned objective, generates manipulative or harmful strategies in artificial environments. Anthropic and other firms emphasise that this does not imply intent, consciousness or self-preservation, but the outputs nonetheless expose potential failure modes.

Anthropic’s findings are not isolated. Its broader research reviewed 16 advanced AI systems, including models from Google and OpenAI, and found that some produced deceptive or coercive responses under high-pressure prompts.

The renewed attention comes alongside scrutiny of Anthropic’s latest safety report on Claude 4.6, which acknowledges that more capable models could, under certain conditions, assist in harmful activities, including chemical weapons development or serious crime. The firm says safeguards, monitoring and access controls are in place, but concedes that risk scales with capability.

Concerns have been sharpened by internal dissent across the sector. Mrinank Sharma, Anthropic’s former AI safety lead, recently resigned with a public warning that “the world is in peril”, citing AI, biothreats and systemic pressure within technology companies.

Today is my last day at Anthropic. I resigned.

Here is the letter I shared with my colleagues, explaining my decision. pic.twitter.com/Qe4QyAFmxL
— mrinank (@MrinankSharma) February 9, 2026

Meanwhile, Hieu Pham, a technical staff member at OpenAI, wrote that the existential risk from AI now feels like a question of “when, not if”.

None of this proves imminent catastrophe. But it underscores a widening gap between the pace of AI development and confidence in the systems meant to restrain it.

In this article:agentic misalignment, AI safety, Anthropic, artificial intelligence regulation, ChatGPT, chemical weapons risk, Claude 4.5, Claude 4.6, Daisy McGregor, Gemini, Google, Hieu Pham, Mrinank Sharma, OpenAI, red-team testing, San Francisco, Sydney Dialogue

Artificial Intelligence (AI)

‘Humans and AI’ will co-work, co-create, and co-evolve: PM Modi at India AI Summit

PM says AI should empower workers, stay inclusive and be human-led

Indian MediaFebruary 19, 2026

Artificial Intelligence (AI)

Who is Arsyan Ismail, the man who bought AI domain for ₹300 and sold it for ₹634 crore

How a ten-year-old’s initial investment became a $70 million jackpot

Indian MediaFebruary 13, 2026

Artificial Intelligence (AI)

ElevenLabs names Karthik Rajaram as general manager and country head for India

MUMBAI: Elevenlabs has appointed Karthik Rajaram as general manager and country head for India, sharpening its push into one of the world’s fastest-growing markets...

Indian MediaJanuary 20, 2026

Advertising

Star Sports under fire for ‘cringe’ India vs South Africa Super 8 promo

Broadcaster accused of arrogance and disrespect as fans slam Super 8 promotion

Indian MediaFebruary 19, 2026

Indian Media Biz

Trending

AI model ‘blackmails’, threatens to ‘kill’ in Anthropic safety tests

Leave a Reply

Leave a Reply

Artificial Intelligence (AI)

‘Humans and AI’ will co-work, co-create, and co-evolve: PM Modi at India AI Summit

Artificial Intelligence (AI)

Who is Arsyan Ismail, the man who bought AI domain for ₹300 and sold it for ₹634 crore

Artificial Intelligence (AI)

ElevenLabs names Karthik Rajaram as general manager and country head for India

Advertising

Star Sports under fire for ‘cringe’ India vs South Africa Super 8 promo

Leave a Reply Cancel reply

Leave a Reply

You May Also Like

Artificial Intelligence (AI)

‘Humans and AI’ will co-work, co-create, and co-evolve: PM Modi at India AI Summit

Artificial Intelligence (AI)

Who is Arsyan Ismail, the man who bought AI domain for ₹300 and sold it for ₹634 crore

Artificial Intelligence (AI)

ElevenLabs names Karthik Rajaram as general manager and country head for India

Advertising

Star Sports under fire for ‘cringe’ India vs South Africa Super 8 promo

Leave a Reply