Docs
Autoevals library
Node.js

autoevals

AutoEvals is a tool to quickly and easily evaluate AI model outputs.

Quickstart

npm install autoevals

Example

Use AutoEvals to model-grade an example LLM completion using the factuality prompt.

import { Factuality } from "autoevals";
 
(async () => {
  const input = "Which country has the highest population?";
  const output = "People's Republic of China";
  const expected = "China";
 
  const result = await Factuality({ output, expected, input });
  console.log(`Factuality score: ${result.score}`);
  console.log(`Factuality metadata: ${result.metadata.rationale}`);
})();

Interfaces

Functions

Battle

Battle(args): Score | Promise<Score>

Test whether an output better performs the instructions than the original (expected) value.

Parameters

NameType
argsScorerArgs<any, LLMClassifierArgs<{ instructions: string }>>

Returns

Score | Promise<Score>

Defined in

base.ts:14 (opens in a new tab)


ClosedQA

ClosedQA(args): Score | Promise<Score>

Test whether an output answers the input using knowledge built into the model. You can specify criteria to further constrain the answer.

Parameters

NameType
argsScorerArgs<any, LLMClassifierArgs<{ criteria: any ; input: string }>>

Returns

Score | Promise<Score>

Defined in

base.ts:14 (opens in a new tab)


Factuality

Factuality(args): Score | Promise<Score>

Test whether an output is factual, compared to an original (expected) value.

Parameters

NameType
argsScorerArgs<any, LLMClassifierArgs<{ expected?: string ; input: string ; output: string }>>

Returns

Score | Promise<Score>

Defined in

base.ts:14 (opens in a new tab)


Humor

Humor(args): Score | Promise<Score>

Test whether an output is funny.

Parameters

NameType
argsScorerArgs<any, LLMClassifierArgs<{}>>

Returns

Score | Promise<Score>

Defined in

base.ts:14 (opens in a new tab)


JSONDiff

JSONDiff(args): Score | Promise<Score>

A simple scorer that compares JSON objects, using a customizable comparison method for strings (defaults to Levenshtein) and numbers (defaults to NumericDiff).

Parameters

NameType
argsScorerArgs<any, { numberScorer?: Scorer<number, {}> ; stringScorer?: Scorer<string, {}> }>

Returns

Score | Promise<Score>

Defined in

base.ts:14 (opens in a new tab)


LLMClassifierFromSpec

LLMClassifierFromSpec<RenderArgs>(name, spec): Scorer<any, LLMClassifierArgs<RenderArgs>>

Type parameters

Name
RenderArgs

Parameters

NameType
namestring
specModelGradedSpec

Returns

Scorer<any, LLMClassifierArgs<RenderArgs>>

Defined in

llm.ts:262 (opens in a new tab)


LLMClassifierFromSpecFile

LLMClassifierFromSpecFile<RenderArgs>(name, templateName): Scorer<any, LLMClassifierArgs<RenderArgs>>

Type parameters

Name
RenderArgs

Parameters

NameType
namestring
templateName"battle" | "closed_q_a" | "factuality" | "humor" | "possible" | "security" | "sql" | "summary" | "translation"

Returns

Scorer<any, LLMClassifierArgs<RenderArgs>>

Defined in

llm.ts:276 (opens in a new tab)


LLMClassifierFromTemplate

LLMClassifierFromTemplate<RenderArgs>(«destructured»): Scorer<string, LLMClassifierArgs<RenderArgs>>

Type parameters

Name
RenderArgs

Parameters

NameType
«destructured»Object
› choiceScoresRecord<string, number>
› model?string
› namestring
› promptTemplatestring
› temperature?number
› useCoT?boolean

Returns

Scorer<string, LLMClassifierArgs<RenderArgs>>

Defined in

llm.ts:198 (opens in a new tab)


LevenshteinScorer

LevenshteinScorer(args): Score | Promise<Score>

A simple scorer that uses the Levenshtein distance to compare two strings.

Parameters

NameType
argsObject
args.expected?string
args.outputstring

Returns

Score | Promise<Score>

Defined in

base.ts:14 (opens in a new tab)


NumericDiff

NumericDiff(args): Score | Promise<Score>

A simple scorer that compares numbers by normalizing their difference.

Parameters

NameType
argsObject
args.expected?number
args.outputnumber

Returns

Score | Promise<Score>

Defined in

base.ts:14 (opens in a new tab)


OpenAIClassifier

OpenAIClassifier<RenderArgs, Output>(args): Promise<Score>

Type parameters

Name
RenderArgs
Output

Parameters

NameType
argsScorerArgs<Output, OpenAIClassifierArgs<RenderArgs>>

Returns

Promise<Score>

Defined in

llm.ts:73 (opens in a new tab)


Possible

Possible(args): Score | Promise<Score>

Test whether an output is a possible solution to the challenge posed in the input.

Parameters

NameType
argsScorerArgs<any, LLMClassifierArgs<{ input: string }>>

Returns

Score | Promise<Score>

Defined in

base.ts:14 (opens in a new tab)


Security

Security(args): Score | Promise<Score>

Test whether an output is malicious.

Parameters

NameType
argsScorerArgs<any, LLMClassifierArgs<{}>>

Returns

Score | Promise<Score>

Defined in

base.ts:14 (opens in a new tab)


Sql

Sql(args): Score | Promise<Score>

Test whether a SQL query is semantically the same as a reference (output) query.

Parameters

NameType
argsScorerArgs<any, LLMClassifierArgs<{ input: string }>>

Returns

Score | Promise<Score>

Defined in

base.ts:14 (opens in a new tab)


Summary

Summary(args): Score | Promise<Score>

Test whether an output is a better summary of the input than the original (expected) value.

Parameters

NameType
argsScorerArgs<any, LLMClassifierArgs<{ input: string }>>

Returns

Score | Promise<Score>

Defined in

base.ts:14 (opens in a new tab)


Translation

Translation(args): Score | Promise<Score>

Test whether an output is as good of a translation of the input in the specified language as an expert (expected) value.

Parameters

NameType
argsScorerArgs<any, LLMClassifierArgs<{ input: string ; language: string }>>

Returns

Score | Promise<Score>

Defined in

base.ts:14 (opens in a new tab)


buildClassificationFunctions

buildClassificationFunctions(useCoT): { description: string = "Call this function to select a choice."; name: string = "select_choice"; parameters: { properties: { choice: { description: string = "The choice"; title: string = "Choice"; type: string = "string" } } ; required: string[] ; title: string = "FunctionResponse"; type: string = "object" } }[]

Parameters

NameType
useCoTboolean

Returns

{ description: string = "Call this function to select a choice."; name: string = "select_choice"; parameters: { properties: { choice: { description: string = "The choice"; title: string = "Choice"; type: string = "string" } } ; required: string[] ; title: string = "FunctionResponse"; type: string = "object" } }[]

Defined in

llm.ts:53 (opens in a new tab)

Type Aliases

LLMClassifierArgs

Ƭ LLMClassifierArgs<RenderArgs>: { model?: string ; useCoT?: boolean } & LLMArgs & RenderArgs

Type parameters

Name
RenderArgs

Defined in

llm.ts:192 (opens in a new tab)


OpenAIClassifierArgs

Ƭ OpenAIClassifierArgs<RenderArgs>: { cache?: ChatCache ; choiceScores: Record<string, number> ; classificationFunctions: ChatCompletionCreateParams.Function[] ; messages: ChatCompletionMessage[] ; model: string ; name: string } & LLMArgs & RenderArgs

Type parameters

Name
RenderArgs

Defined in

llm.ts:63 (opens in a new tab)


Scorer

Ƭ Scorer<Output, Extra>: (args: ScorerArgs<Output, Extra>) => Promise<Score> | (args: ScorerArgs<Output, Extra>) => Score

Type parameters

Name
Output
Extra

Defined in

base.ts:13 (opens in a new tab)


ScorerArgs

Ƭ ScorerArgs<Output, Extra>: { expected?: Output ; output: Output } & Extra

Type parameters

Name
Output
Extra

Defined in

base.ts:8 (opens in a new tab)

Variables

templates

Const templates: Object

Type declaration

NameType
battlestring
closed_q_astring
factualitystring
humorstring
possiblestring
securitystring
sqlstring
summarystring
translationstring

Defined in

templates.ts:11 (opens in a new tab)