Class: Dataset
A dataset is a collection of records, such as model inputs and outputs, which represent data you can use to evaluate and fine-tune models. You can log production data to datasets, curate them with interesting examples, edit/delete records, and run evaluations against them.
You should not create Dataset
objects directly. Instead, use the braintrust.initDataset()
method.
Constructors
constructor
• new Dataset(project
, id
, name
, pinnedVersion?
)
Parameters
Name | Type |
---|---|
project | RegisteredProject |
id | string |
name | string |
pinnedVersion? | string |
Methods
[asyncIterator]
▸ [asyncIterator](): AsyncGenerator
<DatasetRecord
, any
, unknown
>
Fetch all records in the dataset.
Returns
AsyncGenerator
<DatasetRecord
, any
, unknown
>
Example
// Use an async iterator to fetch all records in the dataset.
for await (const record of dataset) {
console.log(record);
}
clearCache
▸ clearCache(): void
Returns
void
close
▸ close(): Promise
<string
>
Terminate connection to the dataset and return its id. After calling close, you may not invoke any further methods on the dataset object.
Will be invoked automatically if the dataset is bound as a context manager.
Returns
Promise
<string
>
The dataset id.
delete
▸ delete(id
): string
Parameters
Name | Type |
---|---|
id | string |
Returns
string
fetch
▸ fetch(): AsyncGenerator
<DatasetRecord
, any
, unknown
>
Fetch all records in the dataset.
Returns
AsyncGenerator
<DatasetRecord
, any
, unknown
>
An iterator over the dataset's records.
Example
// Use an async iterator to fetch all records in the dataset.
for await (const record of dataset.fetch()) {
console.log(record);
}
// You can also iterate over the dataset directly.
for await (const record of dataset) {
console.log(record);
}
fetchedData
▸ fetchedData(): Promise
<any
[]>
Returns
Promise
<any
[]>
insert
▸ insert(event
): string
Insert a single record to the dataset. The record will be batched and uploaded behind the scenes. If you pass in an id
,
and a record with that id
already exists, it will be overwritten (upsert).
Parameters
Name | Type | Description |
---|---|---|
event | Object | The event to log. |
event.id? | string | (Optional) a unique identifier for the event. If you don't provide one, Braintrust will generate one for you. |
event.input? | unknown | The argument that uniquely define an input case (an arbitrary, JSON serializable object). |
event.metadata? | Record <string , unknown > | (Optional) a dictionary with additional data about the test example, model outputs, or just about anything else that's relevant, that you can use to help find and analyze examples later. For example, you could log the prompt , example's id , or anything else that would be useful to slice/dice later. The values in metadata can be any JSON-serializable type, but its keys must be strings. |
event.output | unknown | The output of your application, including post-processing (an arbitrary, JSON serializable object). |
Returns
string
The id
of the logged record.
summarize
▸ summarize(options?
): Promise
<DatasetSummary
>
Summarize the dataset, including high level metrics about its size and other metadata.
Parameters
Name | Type |
---|---|
options | Object |
options.summarizeData? | boolean |
Returns
Promise
<DatasetSummary
>
DatasetSummary
A summary of the dataset.
version
▸ version(): Promise
<any
>
Returns
Promise
<any
>
Properties
id
• Readonly
id: string
name
• Readonly
name: string
project
• Readonly
project: RegisteredProject