Alpaca Evals
In collaboration with the Alpaca team (opens in a new tab), we've loaded several submissions from the Alpaca leaderboard (opens in a new tab) into Braintrust, where you can see not only the aggregated performance, but also dig into individual models and better understand their strengths and weaknesses.
Check out the Alpaca Evals (opens in a new tab) project on Braintrust to dig in further—no login required.