Sahl AI Listen Summary WebSocket API
Overview
The Listen Transcription WebSocket API allows you to stream audio from medical encounters and receive summary in real-time. This API is designed for seamless integration into your healthcare applications, enabling efficient and accurate summary generation of medical conversations.
Authentication
Before using the WebSocket API, you need to obtain an authentication token:
- Call the login API to get a token (refer to the login API documentation).
- Use this token in the WebSocket connection as described below.
Communication Flow
- Establishing the WebSocket Connection
- Initiating the Audio Transcription Process
- Stream audio chunks continuously
- Handling Server Responses
- Pausing and Resuming the Stream
- Ending the Audio Stream
- Receiving the Final Summary
Establishing the WebSocket Connection
URL: wss://sau-api.sahl.ai/ws/summary
Authentication: Pass your bearer authentication token as an Authorization header when initiating the websocket:
const socket = new WebSocket("wss://sau-api.sahl.ai/ws/summary", [
"listen-protocol",
YOUR - TOKEN,
])
Initiating the Audio Transcription Process
Start the audio transcription process by sending an initial event to the server. This event includes an object indicating the start of the session and specifies the summary template that will guide the transcription:
socket.send(
JSON.stringify({
object: "start",
summary_template: "FAMILY_MEDICINE",
})
)
Stream audio chunks continuously
The audio data is split into chunks and sent as base64-encoded strings. Each chunk is wrapped in an event object, allowing the server to process the data incrementally:
socket.send(
JSON.stringify({
object: "audio_chunk",
payload: audioAsBase64String,
})
)
Handling Server Responses
As the audio data is processed, the server responds with various events that we need to handle:
Warning Message: If the server detects 10 seconds of silence in the audio stream, it sends a warning message to alert the user:
if (data.object === "warning_message") {
// Handle warning
}
Pausing and Resuming the Stream
At any point, you can pause the audio stream by sending a pause event:
socket.send(
JSON.stringify({
object: "pause",
})
)
To resume the stream, simply start sending audio chunks again with the following object:
socket.send(
JSON.stringify({
object: "audio_chunk",
payload: audioAsBase64String,
})
)
Error Message: In the event of an error during the audio processing, the server sends an error message:
if (data.object === "error_message") {
// Handle error
}
Ending the Audio Stream
When you have finished streaming the audio, it's essential to signal the end of the session. This is done by sending an end event:
socket.send(
JSON.stringify({
object: "end_transcript",
})
)
After sending this event, the server processes any remaining audio chunks and eventually sends back a transcript completion event:
if (data.object === "transcript_completed") {
// Handle transcript completion
}
Receiving the Final Summary
Once the transcript is completed, the server continues processing to generate a summary of the transcription. While waiting for the summary, the server will update the task status periodically:
if (data.object === "task_status") {
const status = data.status
const summary = data.summary || ""
const errorMsg = data.error_msg || ""
if (status === "SUCCESS") {
// Handle successful summary
} else if (status === "FAIL") {
// Handle failure and error message
} else {
// Status is still PENDING
}
}
Event Object: task_status
Status Values: PENDING
, SUCCESS
, FAIL
Summary Output: summary
Error Message: error_msg
On success, the summary will be included in the response, allowing you to finalize the session.
Summary Output
After the transcription process is completed and the server has generated a summary, the output is structured as an array of objects. Each object contains a title and a body, where the body is an array of strings that detail the summarized information:
{
"summary": [
{
"title": "Chief Complaint",
"body": ["54-year-old patient experiencing difficulty walking."]
},
{
"title": "History of Presentation",
"body": [
"- Patient reports trouble with walking.",
"- No additional details on duration, timing, location, quality, severity, context, or factors affecting symptoms provided."
]
}
]
}
Title: Provides the heading for each section of the summary (e.g., "Chief Complaint"). Body: Contains the specific details related to the title, formatted as a list of sentences or points.
Warning Message
A warning message might be sent if there is a significant pause (e.g., more than 10 seconds) during the audio stream. This warning helps in managing the transcription process:
{
"object": "warning_message",
"code": "1007",
"message": "No speech detected for more than 10 seconds.",
"is_final": false,
"id": "5354589e-e4c3-4016-9214-b7c0c13ebe4f"
}
object
: indicates the type of message.
code
: is the specific warning code.
message
: Describes the issue, such as "No speech detected for more than 10 seconds."
is_final
: Indicates whether this is the final warning related to this issue (usually false during ongoing streaming).
id
: A unique identifier for the warning event.
Error Message
An error message is sent when there is a problem with the audio stream, such as incorrect formatting. This message provides details to troubleshoot the issue:
{
"object": "error_message",
"code": "1000",
"message": "audio stream is not in correct format.",
"is_final": false,
"id": "5354589e-e4c3-4016-9214-b7c0c13eeb4g"
}
object
: indicates an error has occurred.
code
: is the error code corresponding to the specific issue.
message
: A description of the error, such as "audio stream is not in correct format."
is_final
: Indicates if this is the final message for this error (usually false during troubleshooting).
id
: A unique identifier for the error event.