BlogBlogDocsDocsCareersCareersContactContact
  • Blog
  • Docs
  • Pricing
  • Careers
  • Contact
  • Discord
  • Sign in
  • Quick start
    • Evaluations
    • Logging
    • Datasets
    • Prompt playground
    • AI proxy
    • Self-host on AWS
  • Examples
    • Alpaca Evals
    • Classification
    • GitHub issue titles
    • QA chat assistant
    • Text-to-SQL
    • Python
    • Node.js
        • Dataset
        • Experiment
        • Logger
        • Noopspan
        • Spanimpl
        • DataSummary
        • DatasetRecord
        • DatasetSummary
        • EvalMetadata
        • Evaluator
        • ExperimentSummary
        • LogOptions
        • ScoreSummary
        • Span
    • Overview
    • Python
    • Node.js
        • ModelGradedSpec
        • Score
    • Architecture
    • Authentication
  • Release notes
Docs
Examples
Alpaca Evals

Alpaca Evals

In collaboration with the Alpaca team (opens in a new tab), we've loaded several submissions from the Alpaca leaderboard (opens in a new tab) into Braintrust, where you can see not only the aggregated performance, but also dig into individual models and better understand their strengths and weaknesses.

Check out the Alpaca Evals (opens in a new tab) project on Braintrust to dig in further—no login required.

Alpaca Example

ExamplesClassification
  • Braintrust
  • Blog
  • Docs
  • Pricing
  • Careers
  • Contact
  • Discord
  • Privacy
  • Terms