A new benchmark, BankerToolBench, tested GPT-5.4, Claude Opus 4.6, and others on junior investment banker tasks. None of the outputs were deemed client-ready, with GPT-5.4 leading but still failing nearly half the criteria. A new benchmark from Handshake AI and McGill University has delivered a sobering reality check for the AI industry. BankerToolBench pits top models against the actual workflows of junior investment bankers — building Excel models, drafting PowerPoint decks, and parsing SEC filings — and the results are emphatic: not a single output from any model was rated ready to send to a client. The study enlisted around 500 current and former investment bankers from firms including Goldman Sachs, JPMorgan, Evercore, Morgan Stanley, and Lazard. Of those, 172 designed the 100 tasks themselves, logging more than 5,700 hours of work. Each task took a human banker an average of five hours, with some running up to 21 hours.…