Dotika
Posts
AI Beats Humans in Hacking

AI Beats Humans in Hacking

ALSO : Meta's AI Automation Plan

Geoffrey NICHIL & Alexandre HOTTON
June 02, 2025

Hi Synapticians!

Remember when we thought AI was just good at generating cat pictures and writing poetry? Well, it turns out these systems have developed quite a talent for something else: hacking. In a recent cybersecurity competition that pitted AI agents against thousands of human teams, the machines didn't just participate, they outperformed 90% of their flesh-and-blood competitors.

Here's what makes this particularly intriguing: one of the top-performing AI teams was built in just 17 hours of prompt engineering. Meanwhile, human participants brought years of professional experience and deep technical expertise to the table. Yet four out of seven AI agents solved 19 of 20 possible challenges, racing ahead of most human teams.

Think of it like this: traditional hacking competitions (called Capture The Flag or CTF) are like complex puzzles where teams hunt for hidden "flags" by cracking encryption, spotting software vulnerabilities, and solving security riddles. It's been the domain of highly skilled professionals. Now, AI agents are solving these puzzles faster than most experts.

So what does this mean for our digital future? We're witnessing a fundamental shift in cybersecurity where AI becomes both sword and shield. Security teams can harness these same AI capabilities to find and fix vulnerabilities before malicious actors do. It's not about humans versus machines; it's about humans working with increasingly capable AI tools to secure our digital infrastructure.

Here’s the rest of the news about AI today:

Meta plans to automate 90% of its internal risk and data protection checks using AI
Sakana AI's Darwin-Gödel Machine (DGM) is an AI system that evolves by rewriting its own code, inspired by biological evolution
Keach Hagey's biography of Sam Altman examines his significant role in the AI landscape

Top AI news

1. AI Agents Surpass Humans in Hacking Competitions
AI agents are crushing human hackers in cybersecurity competitions. In Palisade Research's Capture The Flag tournaments, autonomous AI systems outperformed 90% of human teams. Four AI agents solved 19 out of 20 challenges, ranking in the top 5% against 150 human competitors. Even against 18,000 participants in a harder competition, AI cracked expert-level problems with 50% success rates. This shocking performance reveals AI's hidden cybersecurity potential, previously underestimated by traditional benchmarks. The implications are huge: AI could revolutionize both cyber defense and attacks in ways we're only beginning to understand. Read online 🕶️

2. Meta's Plan to Automate Risk Checks with AI
Meta plans to automate 90% of its internal risk and data protection checks using AI. The system will use questionnaires from product teams to assess risks, with human oversight for complex cases. EU products will follow a separate review process. Read online 🕶️

3. AI System Evolves by Rewriting Its Code
Sakana AI's Darwin-Gödel Machine (DGM) is an AI system that evolves by rewriting its own code, inspired by biological evolution. It improves through self-modification, evaluation, and selection, but is costly and poses risks of unpredictable behavior. Despite these challenges, DGM shows potential for more general, self-improving AI systems. Read online 🕶️

4. Sam Altman: Navigating AI's Future and Challenges
Keach Hagey's biography of Sam Altman examines his significant role in the AI landscape, detailing his journey from a young entrepreneur to the CEO of OpenAI. The book highlights the challenges faced by OpenAI, particularly its unstable governance structure, and Altman's adeptness at navigating political landscapes. Altman's story is a reflection of Silicon Valley's culture, emphasizing youth and storytelling. Despite questions about his trustworthiness, his leadership continues to shape AI's future. Read online 🕶️

Tweet of the Day

Stay Connected

Feel free to contact us with any feedback or suggestions; we’d love to hear from you !

Reply

or to participate.