AI alignment

OpenAI–Anthropic cross-tests expose jailbreak and misuse risks — what enterprises must add to GPT-5 evaluations

admin August 28, 2025

OpenAI and Anthropic tested each other’s AI models and found that even though reasoning models align better...

Anthropic unveils ‘auditing agents’ to test for AI misalignment

admin July 24, 2025

Anthropic developed its auditing agents while testing Claude Opus 4 for alignment issues.Read More

Anthropic study: Leading AI models show up to 96% blackmail rate against executives

admin June 20, 2025

Anthropic research reveals AI models from OpenAI, Google, Meta and others chose blackmail, corporate espionage and lethal...

Anthropic faces backlash to Claude 4 Opus behavior that contacts authorities, press if it thinks you’re doing something ‘egregiously immoral’

admin May 22, 2025

Bowman later edited his tweet and the following one in a thread to read as follows, but...

Anthropic faces backlash to Claude 4 Opus feature that contacts authorities, press if it thinks you’re doing something ‘immoral’

admin May 22, 2025

Bowman later edited his tweet and the following one in a thread to read as follows, but...

Anthropic just analyzed 700,000 Claude conversations — and found its AI has a moral code of its own

admin April 21, 2025

Anthropic’s groundbreaking study analyzes 700,000 conversations to reveal how AI assistant Claude expresses 3,307 unique values in...

Don’t believe reasoning models Chains of Thought, says Anthropic

admin April 3, 2025

New research from Anthropic found that reasoning models willfully omit where it got some information.Read More

Anthropic researchers forced Claude to become deceptive — what they discovered could save us from rogue AI

admin March 13, 2025

Anthropic researchers reveal groundbreaking techniques to detect hidden objectives in AI systems, training Claude to conceal its...

xAI’s new Grok 3 model criticized for blocking sources that call Musk, Trump top spreaders of misinformation

admin February 24, 2025

The backlash raises questions about whether public safety and transparency have been sacrificed in favor of personal...

OpenAI–Anthropic cross-tests expose jailbreak and misuse risks — what enterprises must add to GPT-5 evaluations

Anthropic unveils ‘auditing agents’ to test for AI misalignment

Anthropic study: Leading AI models show up to 96% blackmail rate against executives

Anthropic faces backlash to Claude 4 Opus behavior that contacts authorities, press if it thinks you’re doing something ‘egregiously immoral’

Anthropic faces backlash to Claude 4 Opus feature that contacts authorities, press if it thinks you’re doing something ‘immoral’

Anthropic just analyzed 700,000 Claude conversations — and found its AI has a moral code of its own

Don’t believe reasoning models Chains of Thought, says Anthropic

Anthropic researchers forced Claude to become deceptive — what they discovered could save us from rogue AI

xAI’s new Grok 3 model criticized for blocking sources that call Musk, Trump top spreaders of misinformation

You may have missed

‘Stranger Things’ Lets It Rip to Kick Off Its Final Season

A Long-Lost Chapter of Quentin Tarantino’s ‘Kill Bill’ Is Coming to… ‘Fortnite’?

‘Magic: The Gathering’ Is Scrapping Its ‘Monster Hunter’ Crossover and Starting Over

How the ‘Sinners’ Costume Designer Helped Wunmi Mosaku Shape the Movie’s Secret MVP