o1 beat human physicians on medical benchmarks and real ER cases, per a new paper. Authors urge prospective trials. A new paper tests OpenAI's o1 against physicians on medical benchmarks and real ER cases. o1 outperformed both human doctors and older models across all scenarios. Key facts o1 outperformed human physicians on medical benchmarks. Study included real ER cases, not just synthetic exams. Authors urge prospective clinical trials. Model outperformed both humans and older AI models. Paper does not disclose exact benchmark scores. A new preprint evaluates OpenAI's o1 reasoning model against human physicians on medical benchmarks and real emergency room cases. According to the paper shared by @emollick, "across a variety of scenarios and applications, the large language model outperformed both human physicians and older models." The results span multiple medical domains, including diagnostic accuracy, treatment recommendations, and clinical reasoning tasks drawn from actual ER encounters.…