Targeted tweaks to specific attention heads can slash jailbreak success rates by several‑fold (e.g., reducing from 42% to 8% in the reported experiments), yet a subset of attacks remains viable. The same principle applies when pruning an almost negligible fraction of parameters, erasing most harmful outputs while leaving overall competence intact. Before these interventions, most safety pipelines leaned on broad‑scale alignment—RLHF, instruction fine‑tuning, or post‑hoc classifiers—without a precise view of which internal pathways enabled a model to refuse or comply. Even state‑of‑the‑art LLMs routinely fell to simple linguistic tricks, such as flipping tense, that bypassed their refusal mechanisms. ASGuard first isolates the attention heads that drive the tense‑changing jailbreak, then learns a channel‑wise scaling vector that dampens their activations.…