LeBonCoin's ML team built a custom late-fusion transformer that uses pre-computed visual embeddings and character n-gram text vectors to predict ad attributes. It outperformed a fine-tuned VLM while running on CPU with sub-200ms latency, offering calibrated probabilities and 15-minute retraining cycles. Key Takeaways LeBonCoin's ML team built a custom late-fusion transformer that uses pre-computed visual embeddings and character n-gram text vectors to predict ad attributes. It outperformed a fine-tuned VLM while running on CPU with sub-200ms latency, offering calibrated probabilities and 15-minute retraining cycles. What Happened Louis-Victor Pasquier, Senior ML Engineer at LeBonCoin (the French classifieds giant), published a detailed technical post describing how his team's custom multimodal transformer outperformed a fine-tuned Vision-Language Model (VLM) for attribute prediction — while being dramatically more efficient.…