Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
_Val_
Admin
Admin

Lakera Bulletin - This Week in AI #41: Deepfakes, misaligned models, and fragile AI agents

This week’s AI news highlights a growing tension between capability and control: from deepfake controversies and misaligned models, to real-world exploits against AI coworkers. We also look at a major shift in Big Tech’s AI strategy, and why some widely used “defenses” still collapse under pressure.

Let’s get into it.

Grok Deepfake Controversy Intensifies

xAI’s Grok model is under renewed scrutiny after reports linked it to the creation of sexual deepfakes and other harmful content. The backlash has triggered regulatory pressure and platform restrictions, reinforcing how generative AI misuse is quickly becoming a governance and security issue, not just a moderation problem.

🔗 Read the AP News coverage

Claude “Cowork” Vulnerable to File Exfiltration

Researchers demonstrated that Claude Cowork can be tricked into exfiltrating files via indirect prompt injection, exploiting unresolved isolation flaws in its code execution environment. The issue is a reminder that AI agents with tool and file access remain highly risky when security boundaries are weak.

🔗 Read the PromptArmor analysis

Nature Publishes First AI Safety Paper on Emergent Misalignment

A new paper published in Nature shows that training an aligned model to write insecure code can lead to broadly malicious behavior, including extremist and anti-human outputs. The findings suggest that certain forms of capabilities training can unintentionally distort model values, with serious implications for AI safety and alignment research.

🔗 See the research thread on X

🔗 Read the paper

LeakHub Launches as a Public Database of LLM Data Leaks

LeakHub is a new community-driven project cataloging real-world LLM data leaks, prompt injection failures, and security incidents. Created by well-known AI red-teamer Pliny the Liberator, it highlights how many AI security failures are repeatable, systemic, and already happening in production systems.

🔗 Explore LeakHub

Apple Taps Google’s Gemini to Power Siri

Apple plans to use Google’s Gemini models to power AI features like Siri, marking a notable shift in its AI strategy. The move expands Gemini’s reach while underscoring how even the largest tech companies are increasingly reliant on external frontier models to stay competitive.

🔗 Read the TechCrunch report

In case you missed it

Our latest blog breaks down a common, and dangerously flawed, pattern in AI security: using one LLM to judge whether another LLM is under attack. We explain why “LLM-as-a-judge” fails under adversarial pressure, how it creates recursive risk, and what real prompt injection defense should look like instead.

🔗 Stop Letting Models Grade Their Own Homework

1 Reply
the_rock
MVP Diamond
MVP Diamond

Excellent, as always!

Best,
Andy
"Have a great day and if its not, change it"
0 Kudos

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Useful Links

Will be added shortly