WebSocket API for Real-time Conversation Summarization

Overview

The Listen Transcription WebSocket API allows you to stream audio from medical encounters and receive live transcriptions in real-time. This API is designed for seamless integration into your healthcare applications, enabling efficient and accurate transcription of medical conversations.

Audio Requirements

Format: Linear PCM (WAV/RAW)
Encoding: 16-bit signed little-endian
Channels: Mono (1 channel)
Sample Rate: 16,000 Hz (16kHz)
Frame Size (recommended): 320 bytes (i.e., 10ms chunks)

Authentication

Before using the WebSocket API, you need to obtain an authentication token:

Call the login API to get a token (refer to the login API documentation).
Use this token in the WebSocket connection as described below.

WebSocket Connection

URL: wss://sau-api.sahl.ai/ws/summary

Authentication: Pass your bearer authentication token as an Authorization header when initiating the websocket:

const ws = new WebSocket("wss://sau-api.sahl.ai/ws/summary", [
  "listen-protocol",
  "<YOUR_TOKEN>",
])

Client Messages

1. `start`

Initiate the session and provide metadata.

ws.send(
  JSON.stringify({
    object: "start",
    prompt_id: "xxx", // Please ask your account manager to provide,
  })
)

2. `audio_chunk`

Stream audio chunks continuously as binary data (not JSON).

Sent as raw binary frames (not base64)
Each chunk should be ~10ms (recommended)

ws.send(
  JSON.stringify({
    object: "audio_chunk",
    payload: "<AUDIO_CHUNK_HERE>",
    seq_id: "<SEQUENCE ID>",
  })
)

3. `end_transcript`

Tell the server to stop receiving audio and begin generating the summary.

ws.send(
  JSON.stringify({
    object: "end_transcript",
    seq_id: "<SEQUENCE ID>",
  })
)

Server Messages

1. `encounter_created`

Returns the created encounter ID.

{
  "object": "encounter_created",
  "encounter_id": "xxxxx-xxxxx-xxxxx"
}

2. `ack`

Acknowledges the receipt of an audio chunk.

{
  "object": "ack",
  "seq_id": "<SEQUENCE ID>"
}

3. `transcript_completed`

Acknowledges the completion of the transcript.

{
  "object": "transcript_completed"
}

4. `task_status`

Returns the status of the summary generation task.

{
  "object": "task_status",
  "summary": "Generated Summary",
  "error": null
}

5. `codification_status`

Returns the codification data for the summary. This feature needs to be enabled by your account manager.

{
  "object": "codification_status",
  "status": "SUCCESS",
  "data": {
    "diagnosis": {
      "Principal Diagnosis": [],
      "Supplementary": []
    },
    "procedures": { "orders": [] },
    "allergies": { "allergies": [] },
    "physical_examinations": { "physical_examinations": [] },
    "medications": { "medications": [] }
  },
  "errorMsg": ""
}

Best Practices

Send audio chunks of around 100ms duration for optimal performance.
Handle non-final transcript items by updating previously received items with the same ID.
Sort final transcript items by start_offset_ms before further processing.
Implement error handling for potential WebSocket disconnections.

Error Handling

The WebSocket will close with an error message if there's a fatal error. Common errors include:

Authentication failure
Invalid audio format
Timeout (no audio received for 10 seconds)

Implement appropriate error handling and reconnection logic in your client application.

WebSocket API for Real-time Conversation Summarization

Overview

Audio Requirements

Authentication

WebSocket Connection

Client Messages

1. start

2. audio_chunk

3. end_transcript

Server Messages

1. encounter_created

2. ack

3. transcript_completed

4. task_status

5. codification_status