autoevals

AutoEvals is a tool to quickly and easily evaluate AI model outputs.

Quickstart

npm install autoevals

Example

Use AutoEvals to model-grade an example LLM completion using the factuality prompt.

import { Factuality } from "autoevals";
 
(async () => {
  const input = "Which country has the highest population?";
  const output = "People's Republic of China";
  const expected = "China";
 
  const result = await Factuality({ output, expected, input });
  console.log(`Factuality score: ${result.score}`);
  console.log(`Factuality metadata: ${result.metadata.rationale}`);
})();

Interfaces

Functions

Battle

▸ Battle(args): Score | Promise<Score>

Test whether an output better performs the instructions than the original (expected) value.

Parameters

Name	Type
`args`	`ScorerArgs`<`any`, `LLMClassifierArgs`<{ `instructions`: `string` }>>

Returns

Score | Promise<Score>

Defined in

base.ts:14 (opens in a new tab)

ClosedQA

▸ ClosedQA(args): Score | Promise<Score>

Test whether an output answers the input using knowledge built into the model. You can specify criteria to further constrain the answer.

Parameters

Name	Type
`args`	`ScorerArgs`<`any`, `LLMClassifierArgs`<{ `criteria`: `any` ; `input`: `string` }>>

Returns

Score | Promise<Score>

Defined in

base.ts:14 (opens in a new tab)

Factuality

▸ Factuality(args): Score | Promise<Score>

Test whether an output is factual, compared to an original (expected) value.

Parameters

Name	Type
`args`	`ScorerArgs`<`any`, `LLMClassifierArgs`<{ `expected?`: `string` ; `input`: `string` ; `output`: `string` }>>

Returns

Score | Promise<Score>

Defined in

base.ts:14 (opens in a new tab)

Humor

▸ Humor(args): Score | Promise<Score>

Test whether an output is funny.

Parameters

Name	Type
`args`	`ScorerArgs`<`any`, `LLMClassifierArgs`<{}>>

Returns

Score | Promise<Score>

Defined in

base.ts:14 (opens in a new tab)

JSONDiff

▸ JSONDiff(args): Score | Promise<Score>

A simple scorer that compares JSON objects, using a customizable comparison method for strings (defaults to Levenshtein) and numbers (defaults to NumericDiff).

Parameters

Name	Type
`args`	`ScorerArgs`<`any`, { `numberScorer?`: `Scorer`<`number`, {}> ; `stringScorer?`: `Scorer`<`string`, {}> }>

Returns

Score | Promise<Score>

Defined in

base.ts:14 (opens in a new tab)

LLMClassifierFromSpec

▸ LLMClassifierFromSpec<RenderArgs>(name, spec): Scorer<any, LLMClassifierArgs<RenderArgs>>

Type parameters

Name
`RenderArgs`

Parameters

Name	Type
`name`	`string`
`spec`	`ModelGradedSpec`

Returns

Scorer<any, LLMClassifierArgs<RenderArgs>>

Defined in

llm.ts:262 (opens in a new tab)

LLMClassifierFromSpecFile

▸ LLMClassifierFromSpecFile<RenderArgs>(name, templateName): Scorer<any, LLMClassifierArgs<RenderArgs>>

Type parameters

Name
`RenderArgs`

Parameters

Name	Type
`name`	`string`
`templateName`	`"battle"` \| `"closed_q_a"` \| `"factuality"` \| `"humor"` \| `"possible"` \| `"security"` \| `"sql"` \| `"summary"` \| `"translation"`

Returns

Scorer<any, LLMClassifierArgs<RenderArgs>>

Defined in

llm.ts:276 (opens in a new tab)

LLMClassifierFromTemplate

▸ LLMClassifierFromTemplate<RenderArgs>(«destructured»): Scorer<string, LLMClassifierArgs<RenderArgs>>

Type parameters

Name
`RenderArgs`

Parameters

Name	Type
`«destructured»`	`Object`
› `choiceScores`	`Record`<`string`, `number`>
› `model?`	`string`
› `name`	`string`
› `promptTemplate`	`string`
› `temperature?`	`number`
› `useCoT?`	`boolean`

Returns

Scorer<string, LLMClassifierArgs<RenderArgs>>

Defined in

llm.ts:198 (opens in a new tab)

LevenshteinScorer

▸ LevenshteinScorer(args): Score | Promise<Score>

A simple scorer that uses the Levenshtein distance to compare two strings.

Parameters

Name	Type
`args`	`Object`
`args.expected?`	`string`
`args.output`	`string`

Returns

Score | Promise<Score>

Defined in

base.ts:14 (opens in a new tab)

NumericDiff

▸ NumericDiff(args): Score | Promise<Score>

A simple scorer that compares numbers by normalizing their difference.

Parameters

Name	Type
`args`	`Object`
`args.expected?`	`number`
`args.output`	`number`

Returns

Score | Promise<Score>

Defined in

base.ts:14 (opens in a new tab)

OpenAIClassifier

▸ OpenAIClassifier<RenderArgs, Output>(args): Promise<Score>

Type parameters

Name
`RenderArgs`
`Output`

Parameters

Name	Type
`args`	`ScorerArgs`<`Output`, `OpenAIClassifierArgs`<`RenderArgs`>>

Returns

Promise<Score>

Defined in

llm.ts:73 (opens in a new tab)

Possible

▸ Possible(args): Score | Promise<Score>

Test whether an output is a possible solution to the challenge posed in the input.

Parameters

Name	Type
`args`	`ScorerArgs`<`any`, `LLMClassifierArgs`<{ `input`: `string` }>>

Returns

Score | Promise<Score>

Defined in

base.ts:14 (opens in a new tab)

Security

▸ Security(args): Score | Promise<Score>

Test whether an output is malicious.

Parameters

Name	Type
`args`	`ScorerArgs`<`any`, `LLMClassifierArgs`<{}>>

Returns

Score | Promise<Score>

Defined in

base.ts:14 (opens in a new tab)

Sql

▸ Sql(args): Score | Promise<Score>

Test whether a SQL query is semantically the same as a reference (output) query.

Parameters

Name	Type
`args`	`ScorerArgs`<`any`, `LLMClassifierArgs`<{ `input`: `string` }>>

Returns

Score | Promise<Score>

Defined in

base.ts:14 (opens in a new tab)

Summary

▸ Summary(args): Score | Promise<Score>

Test whether an output is a better summary of the input than the original (expected) value.

Parameters

Name	Type
`args`	`ScorerArgs`<`any`, `LLMClassifierArgs`<{ `input`: `string` }>>

Returns

Score | Promise<Score>

Defined in

base.ts:14 (opens in a new tab)

Translation

▸ Translation(args): Score | Promise<Score>

Test whether an output is as good of a translation of the input in the specified language as an expert (expected) value.

Parameters

Name	Type
`args`	`ScorerArgs`<`any`, `LLMClassifierArgs`<{ `input`: `string` ; `language`: `string` }>>

Returns

Score | Promise<Score>

Defined in

base.ts:14 (opens in a new tab)

buildClassificationFunctions

▸ buildClassificationFunctions(useCoT): { description: string = "Call this function to select a choice."; name: string = "select_choice"; parameters: { properties: { choice: { description: string = "The choice"; title: string = "Choice"; type: string = "string" } } ; required: string[] ; title: string = "FunctionResponse"; type: string = "object" } }[]

Parameters

Name	Type
`useCoT`	`boolean`

Returns

{ description: string = "Call this function to select a choice."; name: string = "select_choice"; parameters: { properties: { choice: { description: string = "The choice"; title: string = "Choice"; type: string = "string" } } ; required: string[] ; title: string = "FunctionResponse"; type: string = "object" } }[]

Defined in

llm.ts:53 (opens in a new tab)

Type Aliases

LLMClassifierArgs

Ƭ LLMClassifierArgs<RenderArgs>: { model?: string ; useCoT?: boolean } & LLMArgs & RenderArgs

Type parameters

Name
`RenderArgs`

Defined in

llm.ts:192 (opens in a new tab)

OpenAIClassifierArgs

Ƭ OpenAIClassifierArgs<RenderArgs>: { cache?: ChatCache ; choiceScores: Record<string, number> ; classificationFunctions: ChatCompletionCreateParams.Function[] ; messages: ChatCompletionMessage[] ; model: string ; name: string } & LLMArgs & RenderArgs

Type parameters

Name
`RenderArgs`

Defined in

llm.ts:63 (opens in a new tab)

Scorer

Ƭ Scorer<Output, Extra>: (args: ScorerArgs<Output, Extra>) => Promise<Score> | (args: ScorerArgs<Output, Extra>) => Score

Type parameters

Name
`Output`
`Extra`

Defined in

base.ts:13 (opens in a new tab)

ScorerArgs

Ƭ ScorerArgs<Output, Extra>: { expected?: Output ; output: Output } & Extra

Type parameters

Name
`Output`
`Extra`

Defined in

base.ts:8 (opens in a new tab)

Variables

templates

• Const templates: Object

Type declaration

Name	Type
`battle`	`string`
`closed_q_a`	`string`
`factuality`	`string`
`humor`	`string`
`possible`	`string`
`security`	`string`
`sql`	`string`
`summary`	`string`
`translation`	`string`

Defined in

templates.ts:11 (opens in a new tab)

Python ModelGradedSpec