Building Mithridatium: Detecting Hidden Backdoors in ML Models

1 / 6

Building Mithridatium: Detecting Hidden Backdoors in ML Models

DEV Community·Pelumi Oluwategbe·26 days ago

#GtySc5AZ

#python #machinelearning #cybersecurity #model #models #mithridatium

Reading 0:00

15s threshold

As pretrained AI models become more common, one growing concern is whether those models can actually be trusted. A model may appear completely normal during testing, but behave maliciously when exposed to a hidden trigger. These attacks are known as backdoor or poisoning attacks, and they represent a serious security risk for real-world AI systems. This semester, our team built Mithridatium - an open-source framework designed to help detect hidden backdoors in pretrained machine learning models. What is a Backdoor? In simple terms, a backdoor attack hides malicious behavior inside an otherwise normal model. Most of the time, the model behaves exactly as expected. But when a specific trigger appears in the input, the model changes its behavior in a way that benefits an attacker. Imagine a self-driving vehicle that correctly recognizes stop signs during testing, but misclassifies them when a small sticker or visual trigger is placed on the sign.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

Building Mithridatium: Detecting Hidden Backdoors in ML Models