OpenAI’s SDK currently doesn’t support streaming for models GPT-3.5-Turbo
or GPT-4
.
Yes, very sad, anyway. I decided to DIY this shit.
Backend
On Node you can use the fetch
api and get a ReadableStream
of bytes as a response.
const openAIReadableTextStream = async (path: string, body: any) => {
const response = await fetch(`https://api.openai.com/v1${path}`, {
method: 'POST',
headers: {
'Content-Type': 'application/json',
Authorization: `Bearer ${process.env.OPENAI_API_KEY}`,
},
body: JSON.stringify({
...body,
stream: true,
}),
});
if (!response.body) throw new Error('No response body.');
return response.body.pipeThrough(new TextDecoderStream());
};
Here we use the fetch
api to make a call to the OpenAI server and get a ReadableStream<UInt8Array>
in response. It needs to be decoded into plaintext so we do that with pipeThrough
.
The OpenAI streaming endpoints return the response as an event stream.
The next step is to parse the event stream, and connect it to a Response stream. For parsing the event stream we can use eventsource-parser.
Installation:
npm i eventsource-parser --save
export const getStreamingChatCompletion = async ({
messages,
writeStream,
}: {
messages: ChatCompletionMessage[];
writeStream: Response<any>;
}) => {
function onParse(event: ParseEvent) {
if (event.type === 'event') {
if (event.data !== '[DONE]') {
writeStream.write(JSON.parse(event.data).choices[0].delta?.content);
}
}
}
try {
const response = await openAIReadableTextStream('/chat/completions', {
model: 'gpt-4',
messages,
});
const parser = createParser(onParse);
// @ts-expect-error Node 16+ supports async iterables
for await (const value of response) {
parser.feed(value);
}
writeStream.end();
} catch (error) {
console.error(error);
return 'Failed to get streaming completion.';
}
};
We take the individual events and parse the data, which then gets written to the response stream.
Once the end of the event stream is reached, we can end
the response stream.
Now we can hook up the express
endpoint with the chat completion stream.
app.get('/chatCompletion', async (req, res) => {
const headers = {
'Content-Type': 'text/event-stream',
Connection: 'keep-alive',
'Cache-Control': 'no-cache',
};
res.writeHead(200, headers);
await getStreamingChatCompletion({
// this is where the messages list goes
messages,
writeStream: res,
});
});
Here we can see that the response stream is just a response object we get access to inside an express
endpoint callback.
Frontend
This is the developer experience I was looking for:
const Component: FC = () => {
const [streamingData, triggerQuery] = useStreamingQuery('/chatCompletion');
return (
<div>
{streamingData}
<button onClick={triggerQuery} />
</div>
);
};
I wrote a few hooks that abstract away all of the ReadableStream
synchronization logic, and some nice-to-have data fetching wrappers.
useStreamingQuery Hook
This is one of the wrappers that get exposed from readable-hook
.
Internally it uses useReadable
, which takes a stream producer (the fetch
API in case of the useStreamingQuery
hook), and returns a query trigger and the streamed data.
Installation:
npm i readable-hook --save
This is a simplified version of the hook. Check out readable-hook for more details.
const useStreamingQuery = (path: string): [string, () => void] => {
const [data, setData] = useState('');
const queryStream = useCallback(async () => {
const response = await fetch(`${BASE_URL}${path}`);
if (!response.body) throw new Error('No response body found.');
const reader = response.getReader();
async function syncWithTextStream() {
const { value, done } = await reader.read();
if (!done) {
setData(value);
requestAnimationFrame(() => {
syncWithTextStream();
});
}
}
syncWithTextStream();
}, [path]);
return [data, queryStream];
};
We setup intermediate state (on line 2), and the query function that handles fetching data from the stream periodically (on line 3). Both are then returned from the hook (on line 22). Even though the internals of the hook are fairly straightforward, the hook makes it much easier to re-use streaming data in other parts of the app.
Once the hook is initialized, we can read the values from streamingData
, and update the UI.
The hook takes care of all the heavy-lifting.
After all this hard work, we can finally have streaming response from the OpenAI chat completion apis.