As pretrained AI models become more common, one growing concern is whether those models can actually be trusted. A model may appear completely normal during testing, but behave maliciously when exposed to a hidden trigger. These attacks are known as backdoor or poisoning attacks, and they represent a serious security risk for real-world AI systems. This semester, our team built Mithridatium - an open-source framework designed to help detect hidden backdoors in pretrained machine learning models. What is a Backdoor? In simple terms, a backdoor attack hides malicious behavior inside an otherwise normal model. Most of the time, the model behaves exactly as expected. But when a specific trigger appears in the input, the model changes its behavior in a way that benefits an attacker. Imagine a self-driving vehicle that correctly recognizes stop signs during testing, but misclassifies them when a small sticker or visual trigger is placed on the sign.…