Skip to main content

Streaming

Some Chat models provide a streaming response. This means that instead of waiting for the entire response to be returned, you can start processing it as soon as it's available. This is useful if you want to display the response to the user as it's being generated, or if you want to process the response as it's being generated.

Using .stream()

The easiest way to stream is to use the .stream() method. This returns an readable stream that you can also iterate over:

import { ChatOpenAI } from "langchain/chat_models/openai";

const chat = new ChatOpenAI({
maxTokens: 25,
});

// Pass in a human message. Also accepts a raw string, which is automatically
// inferred to be a human message.
const stream = await chat.stream([["human", "Tell me a joke about bears."]]);

for await (const chunk of stream) {
console.log(chunk);
}
/*
AIMessageChunk {
content: '',
additional_kwargs: {}
}
AIMessageChunk {
content: 'Why',
additional_kwargs: {}
}
AIMessageChunk {
content: ' did',
additional_kwargs: {}
}
AIMessageChunk {
content: ' the',
additional_kwargs: {}
}
AIMessageChunk {
content: ' bear',
additional_kwargs: {}
}
AIMessageChunk {
content: ' bring',
additional_kwargs: {}
}
AIMessageChunk {
content: ' a',
additional_kwargs: {}
}
...
*/

API Reference:

For models that do not support streaming, the entire response will be returned as a single chunk.

For convenience, you can also pipe a chat model into a StringOutputParser to extract just the raw string values from each chunk:

import { ChatOpenAI } from "langchain/chat_models/openai";
import { StringOutputParser } from "langchain/schema/output_parser";

const parser = new StringOutputParser();

const model = new ChatOpenAI({ temperature: 0 });

const stream = await model.pipe(parser).stream("Hello there!");

for await (const chunk of stream) {
console.log(chunk);
}

/*
Hello
!
How
can
I
assist
you
today
?
*/

API Reference:

You can also do something similar to stream bytes directly (e.g. for returning a stream in an HTTP response) using the HttpResponseOutputParser:

import { ChatOpenAI } from "langchain/chat_models/openai";
import { HttpResponseOutputParser } from "langchain/output_parsers";

const handler = async () => {
const parser = new HttpResponseOutputParser();

const model = new ChatOpenAI({ temperature: 0 });

const stream = await model.pipe(parser).stream("Hello there!");

const httpResponse = new Response(stream, {
headers: {
"Content-Type": "text/plain; charset=utf-8",
},
});

return httpResponse;
};

await handler();

API Reference:

Using a callback handler

You can also use a CallbackHandler like so:

import { ChatOpenAI } from "langchain/chat_models/openai";
import { HumanMessage } from "langchain/schema";

const chat = new ChatOpenAI({
maxTokens: 25,
streaming: true,
});

const response = await chat.call([new HumanMessage("Tell me a joke.")], {
callbacks: [
{
handleLLMNewToken(token: string) {
console.log({ token });
},
},
],
});

console.log(response);
// { token: '' }
// { token: '\n\n' }
// { token: 'Why' }
// { token: ' don' }
// { token: "'t" }
// { token: ' scientists' }
// { token: ' trust' }
// { token: ' atoms' }
// { token: '?\n\n' }
// { token: 'Because' }
// { token: ' they' }
// { token: ' make' }
// { token: ' up' }
// { token: ' everything' }
// { token: '.' }
// { token: '' }
// AIMessage {
// text: "\n\nWhy don't scientists trust atoms?\n\nBecause they make up everything."
// }

API Reference: