WebSocket API for Real-time Conversation Summarization
Overview
The Listen Transcription WebSocket API allows you to stream audio from medical encounters and receive live transcriptions in real-time. This API is designed for seamless integration into your healthcare applications, enabling efficient and accurate transcription of medical conversations.
Audio Requirements
- Format: Linear PCM (WAV/RAW)
- Encoding: 16-bit signed little-endian
- Channels: Mono (1 channel)
- Sample Rate: 16,000 Hz (16kHz)
- Frame Size (recommended): 320 bytes (i.e., 10ms chunks)
Authentication
Before using the WebSocket API, you need to obtain an authentication token:
- Call the login API to get a token (refer to the login API documentation).
- Use this token in the WebSocket connection as described below.
WebSocket Connection
URL: wss://sau-api.sahl.ai/ws/summary
Authentication: Pass your bearer authentication token as an Authorization header when initiating the websocket:
const ws = new WebSocket("wss://sau-api.sahl.ai/ws/summary", [
"listen-protocol",
"<YOUR_TOKEN>",
])
Client Messages
1. start
Initiate the session and provide metadata.
ws.send(
JSON.stringify({
object: "start",
prompt_id: "xxx", // Please ask your account manager to provide,
})
)
2. audio_chunk
Stream audio chunks continuously as binary data (not JSON).
- Sent as raw binary frames (not base64)
- Each chunk should be ~10ms (recommended)
ws.send(
JSON.stringify({
object: "audio_chunk",
payload: "<AUDIO_CHUNK_HERE>",
seq_id: "<SEQUENCE ID>",
})
)
3. end_transcript
Tell the server to stop receiving audio and begin generating the summary.
ws.send(
JSON.stringify({
object: "end_transcript",
seq_id: "<SEQUENCE ID>",
})
)
Server Messages
1. encounter_created
Returns the created encounter ID.
{
"object": "encounter_created",
"encounter_id": "xxxxx-xxxxx-xxxxx"
}
2. ack
Acknowledges the receipt of an audio chunk.
{
"object": "ack",
"seq_id": "<SEQUENCE ID>"
}
3. transcript_completed
Acknowledges the completion of the transcript.
{
"object": "transcript_completed"
}
4. task_status
Returns the status of the summary generation task.
{
"object": "task_status",
"summary": "Generated Summary",
"error": null
}
5. codification_status
Returns the codification data for the summary. This feature needs to be enabled by your account manager.
{
"object": "codification_status",
"status": "SUCCESS",
"data": {
"diagnosis": {
"Principal Diagnosis": [],
"Supplementary": []
},
"procedures": { "orders": [] },
"allergies": { "allergies": [] },
"physical_examinations": { "physical_examinations": [] },
"medications": { "medications": [] }
},
"errorMsg": ""
}
Best Practices
- Send audio chunks of around 100ms duration for optimal performance.
- Handle non-final transcript items by updating previously received items with the same ID.
- Sort final transcript items by
start_offset_ms
before further processing. - Implement error handling for potential WebSocket disconnections.
Error Handling
The WebSocket will close with an error message if there's a fatal error. Common errors include:
- Authentication failure
- Invalid audio format
- Timeout (no audio received for 10 seconds)
Implement appropriate error handling and reconnection logic in your client application.