Menu

Post image 1
Post image 2
Post image 3
Post image 4
Post image 5
Post image 6
1 / 6
0

Building Mithridatium: Detecting Hidden Backdoors in ML Models

DEV Community·Pelumi Oluwategbe·26 days ago
#GtySc5AZ
Reading 0:00
15s threshold

As pretrained AI models become more common, one growing concern is whether those models can actually be trusted. A model may appear completely normal during testing, but behave maliciously when exposed to a hidden trigger. These attacks are known as backdoor or poisoning attacks, and they represent a serious security risk for real-world AI systems. This semester, our team built Mithridatium - an open-source framework designed to help detect hidden backdoors in pretrained machine learning models. What is a Backdoor? In simple terms, a backdoor attack hides malicious behavior inside an otherwise normal model. Most of the time, the model behaves exactly as expected. But when a specific trigger appears in the input, the model changes its behavior in a way that benefits an attacker. Imagine a self-driving vehicle that correctly recognizes stop signs during testing, but misclassifies them when a small sticker or visual trigger is placed on the sign.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Read More