autoevals
AutoEvals is a tool to quickly and easily evaluate AI model outputs.
Quickstart
npm install autoevalsExample
Use AutoEvals to model-grade an example LLM completion using the factuality prompt.
import { Factuality } from "autoevals";
(async () => {
const input = "Which country has the highest population?";
const output = "People's Republic of China";
const expected = "China";
const result = await Factuality({ output, expected, input });
console.log(`Factuality score: ${result.score}`);
console.log(`Factuality metadata: ${result.metadata.rationale}`);
})();Interfaces
Functions
Battle
▸ Battle(args): Score | Promise<Score>
Test whether an output better performs the instructions than the original
(expected) value.
Parameters
| Name | Type |
|---|---|
args | ScorerArgs<any, LLMClassifierArgs<{ instructions: string }>> |
Returns
Defined in
base.ts:14 (opens in a new tab)
ClosedQA
▸ ClosedQA(args): Score | Promise<Score>
Test whether an output answers the input using knowledge built into the model.
You can specify criteria to further constrain the answer.
Parameters
| Name | Type |
|---|---|
args | ScorerArgs<any, LLMClassifierArgs<{ criteria: any ; input: string }>> |
Returns
Defined in
base.ts:14 (opens in a new tab)
Factuality
▸ Factuality(args): Score | Promise<Score>
Test whether an output is factual, compared to an original (expected) value.
Parameters
| Name | Type |
|---|---|
args | ScorerArgs<any, LLMClassifierArgs<{ expected?: string ; input: string ; output: string }>> |
Returns
Defined in
base.ts:14 (opens in a new tab)
Humor
▸ Humor(args): Score | Promise<Score>
Test whether an output is funny.
Parameters
| Name | Type |
|---|---|
args | ScorerArgs<any, LLMClassifierArgs<{}>> |
Returns
Defined in
base.ts:14 (opens in a new tab)
JSONDiff
▸ JSONDiff(args): Score | Promise<Score>
A simple scorer that compares JSON objects, using a customizable comparison method for strings (defaults to Levenshtein) and numbers (defaults to NumericDiff).
Parameters
| Name | Type |
|---|---|
args | ScorerArgs<any, { numberScorer?: Scorer<number, {}> ; stringScorer?: Scorer<string, {}> }> |
Returns
Defined in
base.ts:14 (opens in a new tab)
LLMClassifierFromSpec
▸ LLMClassifierFromSpec<RenderArgs>(name, spec): Scorer<any, LLMClassifierArgs<RenderArgs>>
Type parameters
| Name |
|---|
RenderArgs |
Parameters
| Name | Type |
|---|---|
name | string |
spec | ModelGradedSpec |
Returns
Scorer<any, LLMClassifierArgs<RenderArgs>>
Defined in
llm.ts:262 (opens in a new tab)
LLMClassifierFromSpecFile
▸ LLMClassifierFromSpecFile<RenderArgs>(name, templateName): Scorer<any, LLMClassifierArgs<RenderArgs>>
Type parameters
| Name |
|---|
RenderArgs |
Parameters
| Name | Type |
|---|---|
name | string |
templateName | "battle" | "closed_q_a" | "factuality" | "humor" | "possible" | "security" | "sql" | "summary" | "translation" |
Returns
Scorer<any, LLMClassifierArgs<RenderArgs>>
Defined in
llm.ts:276 (opens in a new tab)
LLMClassifierFromTemplate
▸ LLMClassifierFromTemplate<RenderArgs>(«destructured»): Scorer<string, LLMClassifierArgs<RenderArgs>>
Type parameters
| Name |
|---|
RenderArgs |
Parameters
| Name | Type |
|---|---|
«destructured» | Object |
› choiceScores | Record<string, number> |
› model? | string |
› name | string |
› promptTemplate | string |
› temperature? | number |
› useCoT? | boolean |
Returns
Scorer<string, LLMClassifierArgs<RenderArgs>>
Defined in
llm.ts:198 (opens in a new tab)
LevenshteinScorer
▸ LevenshteinScorer(args): Score | Promise<Score>
A simple scorer that uses the Levenshtein distance to compare two strings.
Parameters
| Name | Type |
|---|---|
args | Object |
args.expected? | string |
args.output | string |
Returns
Defined in
base.ts:14 (opens in a new tab)
NumericDiff
▸ NumericDiff(args): Score | Promise<Score>
A simple scorer that compares numbers by normalizing their difference.
Parameters
| Name | Type |
|---|---|
args | Object |
args.expected? | number |
args.output | number |
Returns
Defined in
base.ts:14 (opens in a new tab)
OpenAIClassifier
▸ OpenAIClassifier<RenderArgs, Output>(args): Promise<Score>
Type parameters
| Name |
|---|
RenderArgs |
Output |
Parameters
| Name | Type |
|---|---|
args | ScorerArgs<Output, OpenAIClassifierArgs<RenderArgs>> |
Returns
Promise<Score>
Defined in
llm.ts:73 (opens in a new tab)
Possible
▸ Possible(args): Score | Promise<Score>
Test whether an output is a possible solution to the challenge posed in the input.
Parameters
| Name | Type |
|---|---|
args | ScorerArgs<any, LLMClassifierArgs<{ input: string }>> |
Returns
Defined in
base.ts:14 (opens in a new tab)
Security
▸ Security(args): Score | Promise<Score>
Test whether an output is malicious.
Parameters
| Name | Type |
|---|---|
args | ScorerArgs<any, LLMClassifierArgs<{}>> |
Returns
Defined in
base.ts:14 (opens in a new tab)
Sql
▸ Sql(args): Score | Promise<Score>
Test whether a SQL query is semantically the same as a reference (output) query.
Parameters
| Name | Type |
|---|---|
args | ScorerArgs<any, LLMClassifierArgs<{ input: string }>> |
Returns
Defined in
base.ts:14 (opens in a new tab)
Summary
▸ Summary(args): Score | Promise<Score>
Test whether an output is a better summary of the input than the original (expected) value.
Parameters
| Name | Type |
|---|---|
args | ScorerArgs<any, LLMClassifierArgs<{ input: string }>> |
Returns
Defined in
base.ts:14 (opens in a new tab)
Translation
▸ Translation(args): Score | Promise<Score>
Test whether an output is as good of a translation of the input in the specified language
as an expert (expected) value.
Parameters
| Name | Type |
|---|---|
args | ScorerArgs<any, LLMClassifierArgs<{ input: string ; language: string }>> |
Returns
Defined in
base.ts:14 (opens in a new tab)
buildClassificationFunctions
▸ buildClassificationFunctions(useCoT): { description: string = "Call this function to select a choice."; name: string = "select_choice"; parameters: { properties: { choice: { description: string = "The choice"; title: string = "Choice"; type: string = "string" } } ; required: string[] ; title: string = "FunctionResponse"; type: string = "object" } }[]
Parameters
| Name | Type |
|---|---|
useCoT | boolean |
Returns
{ description: string = "Call this function to select a choice."; name: string = "select_choice"; parameters: { properties: { choice: { description: string = "The choice"; title: string = "Choice"; type: string = "string" } } ; required: string[] ; title: string = "FunctionResponse"; type: string = "object" } }[]
Defined in
llm.ts:53 (opens in a new tab)
Type Aliases
LLMClassifierArgs
Ƭ LLMClassifierArgs<RenderArgs>: { model?: string ; useCoT?: boolean } & LLMArgs & RenderArgs
Type parameters
| Name |
|---|
RenderArgs |
Defined in
llm.ts:192 (opens in a new tab)
OpenAIClassifierArgs
Ƭ OpenAIClassifierArgs<RenderArgs>: { cache?: ChatCache ; choiceScores: Record<string, number> ; classificationFunctions: ChatCompletionCreateParams.Function[] ; messages: ChatCompletionMessage[] ; model: string ; name: string } & LLMArgs & RenderArgs
Type parameters
| Name |
|---|
RenderArgs |
Defined in
llm.ts:63 (opens in a new tab)
Scorer
Ƭ Scorer<Output, Extra>: (args: ScorerArgs<Output, Extra>) => Promise<Score> | (args: ScorerArgs<Output, Extra>) => Score
Type parameters
| Name |
|---|
Output |
Extra |
Defined in
base.ts:13 (opens in a new tab)
ScorerArgs
Ƭ ScorerArgs<Output, Extra>: { expected?: Output ; output: Output } & Extra
Type parameters
| Name |
|---|
Output |
Extra |
Defined in
base.ts:8 (opens in a new tab)
Variables
templates
• Const templates: Object
Type declaration
| Name | Type |
|---|---|
battle | string |
closed_q_a | string |
factuality | string |
humor | string |
possible | string |
security | string |
sql | string |
summary | string |
translation | string |