I’ve been in the process of updating my AI Strategy app, and one of the biggest challenges is drilling through these different model leaderboards and identifying the features that surface insights project managers, engineers, and data scientists can actually use. That second tip is understated 🎨 Like when I warn users about the Berkeley Function-Calling Leaderboard’s (BFCL’s) table having colors with no legend or hint as to what they represent, I’m not an exaggerating. And just east of the visible columns are five more columns with even more colors . From what I can tell these colors are purely decorative Actual footage of the BFCL team designing that dashboard… Says the woman with lime green and orange brand colors Anyway, back to the task at hand. Let’s start with the basics. Key Terms There are a few terms you’ll need to get comfortable with when you start wading into the icy evaluation waters.…