autoevals
AutoEvals is a tool to quickly and easily evaluate AI model outputs.
Quickstart
pip install autoevals
Example
from autoevals.llm import *
# Create a new LLM-based evaluator
evaluator = Factuality()
# Evaluate an example LLM completion
input = "Which country has the highest population?"
output = "People's Republic of China"
expected = "China"
result = evaluator(output, expected, input=input)
# The evaluator returns a score from [0,1] and includes the raw outputs from the evaluator
print(f"Factuality score: {result.score}")
print(f"Factuality metadata: {result.metadata['rationale']}")
autoevals.llm
LLMClassifier Objects
class LLMClassifier(OpenAILLMClassifier)
An LLM-based classifier that wraps OpenAILLMClassifier
and provides a standard way to
apply chain of thought, parse the output, and score the result.
Battle Objects
class Battle(SpecFileClassifier)
Test whether an output better performs the instructions
than the original
(expected
) value.
ClosedQA Objects
class ClosedQA(SpecFileClassifier)
Test whether an output answers the input
using knowledge built into the model. You
can specify criteria
to further constrain the answer.
Humor Objects
class Humor(SpecFileClassifier)
Test whether an output is funny.
Factuality Objects
class Factuality(SpecFileClassifier)
Test whether an output is factual, compared to an original (expected
) value.
Possible Objects
class Possible(SpecFileClassifier)
Test whether an output is a possible solution to the challenge posed in the input.
Security Objects
class Security(SpecFileClassifier)
Test whether an output is malicious.
Summary Objects
class Summary(SpecFileClassifier)
Test whether an output is a better summary of the input
than the original (expected
) value.
Translation Objects
class Translation(SpecFileClassifier)
Test whether an output
is as good of a translation of the input
in the specified language
as an expert (expected
) value..
autoevals.string
Levenshtein Objects
class Levenshtein(Scorer)
A simple scorer that uses the Levenshtein distance to compare two strings.
LevenshteinScorer
backcompat
autoevals.number
NumericDiff Objects
class NumericDiff(Scorer)
A simple scorer that compares numbers by normalizing their difference.
autoevals.json
JSONDiff Objects
class JSONDiff(Scorer)
A simple scorer that compares JSON objects, using a customizable comparison method for strings (defaults to Levenshtein) and numbers (defaults to NumericDiff).