WebSocket API for Real-time Conversation Summarization

Overview

The Listen Transcription WebSocket API allows you to stream audio from medical encounters and receive live transcriptions in real-time. This API is designed for seamless integration into your healthcare applications, enabling efficient and accurate transcription of medical conversations.


Audio Requirements

  • Format: Linear PCM (WAV/RAW)
  • Encoding: 16-bit signed little-endian
  • Channels: Mono (1 channel)
  • Sample Rate: 16,000 Hz (16kHz)
  • Frame Size (recommended): 320 bytes (i.e., 10ms chunks)

Authentication

Before using the WebSocket API, you need to obtain an authentication token:

  1. Call the login API to get a token (refer to the login API documentation).
  2. Use this token in the WebSocket connection as described below.

WebSocket Connection

URL: wss://sau-api.sahl.ai/ws/summary

Authentication: Pass your bearer authentication token as an Authorization header when initiating the websocket:

const ws = new WebSocket("wss://sau-api.sahl.ai/ws/summary", [
  "listen-protocol",
  "<YOUR_TOKEN>",
])

Client Messages

1. start

Initiate the session and provide metadata.

ws.send(
  JSON.stringify({
    object: "start",
    prompt_id: "xxx", // Please ask your account manager to provide,
  })
)

2. audio_chunk

Stream audio chunks continuously as binary data (not JSON).

  • Sent as raw binary frames (not base64)
  • Each chunk should be ~10ms (recommended)
ws.send(
  JSON.stringify({
    object: "audio_chunk",
    payload: "<AUDIO_CHUNK_HERE>",
    seq_id: "<SEQUENCE ID>",
  })
)

3. end_transcript

Tell the server to stop receiving audio and begin generating the summary.

ws.send(
  JSON.stringify({
    object: "end_transcript",
    seq_id: "<SEQUENCE ID>",
  })
)

Server Messages

1. encounter_created

Returns the created encounter ID.

{
  "object": "encounter_created",
  "encounter_id": "xxxxx-xxxxx-xxxxx"
}

2. ack

Acknowledges the receipt of an audio chunk.

{
  "object": "ack",
  "seq_id": "<SEQUENCE ID>"
}

3. transcript_completed

Acknowledges the completion of the transcript.

{
  "object": "transcript_completed"
}

4. task_status

Returns the status of the summary generation task.

{
  "object": "task_status",
  "summary": "Generated Summary",
  "error": null
}

5. codification_status

Returns the codification data for the summary. This feature needs to be enabled by your account manager.

{
  "object": "codification_status",
  "status": "SUCCESS",
  "data": {
    "diagnosis": {
      "Principal Diagnosis": [],
      "Supplementary": []
    },
    "procedures": { "orders": [] },
    "allergies": { "allergies": [] },
    "physical_examinations": { "physical_examinations": [] },
    "medications": { "medications": [] }
  },
  "errorMsg": ""
}

Best Practices

  • Send audio chunks of around 100ms duration for optimal performance.
  • Handle non-final transcript items by updating previously received items with the same ID.
  • Sort final transcript items by start_offset_ms before further processing.
  • Implement error handling for potential WebSocket disconnections.

Error Handling

The WebSocket will close with an error message if there's a fatal error. Common errors include:

  • Authentication failure
  • Invalid audio format
  • Timeout (no audio received for 10 seconds)

Implement appropriate error handling and reconnection logic in your client application.