Dotika
Posts
Anthropic Challenges Hackers: Break Our AI

Anthropic Challenges Hackers: Break Our AI

ALSO : UK's AI Health Data Gamble

Geoffrey NICHIL & Alexandre HOTTON
February 04, 2025

Hey Synapticians,

Who says AI, says data. Who says data, says security. And who says security... says hassle? 😉

This Tuesday, the main theme is data security, but also model security (give a round of applause for our brilliant intro! 🙂). On the agenda today, Anthropic has launched a challenge to see if their model can be hacked. The UK's grand AI plan is being put to the test when it comes to trust and data access. MIT is working on aligning AI models with human values. And finally, AI is being used to enhance cancer detection through abdominal CT scans. Stay tuned!

Top AI news

1. Anthropic Dares Public to Jailbreak Its New AI Model
On February 4, 2025, Anthropic announced a week-long public challenge to test the resilience of its latest AI model against jailbreak attempts. This follows over 3,000 hours of internal testing where experts were unable to bypass the model's safety measures. The initiative aims to uncover any remaining vulnerabilities by encouraging the public to try and elicit harmful or unethical responses from the AI. Insights gained will be used to enhance the security and reliability of Anthropic's future AI models.

2. UK's AI Plan Faces Trust Issues Over Health Data
The UK government plans to boost innovation via AI by leveraging NHS health data. However, previous initiatives like care.data in 2014 failed due to public concerns over data privacy and commercial access. Similar apprehensions arise with the involvement of U.S. firm Palantir in the new NHS data platform. Building public trust is crucial yet challenging, especially with complex technologies. Overcoming the "trustworthiness recognition problem" requires effective and transparent communication strategies to ensure the success of the AI plan.

3. AI-Powered AbdomenAtlas Enhances Early Cancer Detection
Researchers at Johns Hopkins University have developed AbdomenAtlas, the most extensive dataset of abdominal CT scans to date, featuring over 45,000 3D scans annotated for 142 anatomical structures from 145 hospitals worldwide. By leveraging artificial intelligence, the team significantly accelerated the annotation process, combining AI predictions with expert radiologist reviews. This comprehensive dataset aims to assist radiologists in quickly and accurately identifying tumors and other diseases, potentially leading to earlier cancer detection and improved patient outcomes.

Bonus. MIT Student Tackles AI's Dark Side
Audrey Lorvo, a senior at MIT majoring in computer science, economics, and data science, is dedicated to researching AI safety. Her work emphasizes ensuring that increasingly advanced AI models remain reliable and aligned with human values. She addresses technical challenges like robustness and societal concerns such as transparency and accountability. Through initiatives like the AI Safety Technical Fellowship, Lorvo deepens her understanding of AI's technical aspects to propose effective governance strategies. She advocates for critical assessment of AI's rapid advancements to develop the technology safely, ensuring humanity benefits without losing control.

Meme of the Day

Theme of the Week

AI Generated Music - Real applications
Learn with us how to create AI generated songs with Suno!

Stay Connected

Feel free to contact us with any feedback or suggestions—we’d love to hear from you !

Reply

or to participate.