Sahl AI Listen Summary WebSocket API

Overview

The Listen Transcription WebSocket API allows you to stream audio from medical encounters and receive summary in real-time. This API is designed for seamless integration into your healthcare applications, enabling efficient and accurate summary generation of medical conversations.

Authentication

Before using the WebSocket API, you need to obtain an authentication token:

Call the login API to get a token (refer to the login API documentation).
Use this token in the WebSocket connection as described below.

Communication Flow

Establishing the WebSocket Connection
Initiating the Audio Transcription Process
Stream audio chunks continuously
Handling Server Responses
Pausing and Resuming the Stream
Ending the Audio Stream
Receiving the Final Summary

Establishing the WebSocket Connection

URL: wss://sau-api.sahl.ai/ws/summary

Authentication: Pass your bearer authentication token as an Authorization header when initiating the websocket:

const socket = new WebSocket("wss://sau-api.sahl.ai/ws/summary", [
  "listen-protocol",
  YOUR - TOKEN,
])

Initiating the Audio Transcription Process

Start the audio transcription process by sending an initial event to the server. This event includes an object indicating the start of the session and specifies the summary template that will guide the transcription:

socket.send(
  JSON.stringify({
    object: "start",
    summary_template: "FAMILY_MEDICINE",
  })
)

Stream audio chunks continuously

The audio data is split into chunks and sent as base64-encoded strings. Each chunk is wrapped in an event object, allowing the server to process the data incrementally:

socket.send(
  JSON.stringify({
    object: "audio_chunk",
    payload: audioAsBase64String,
  })
)

Handling Server Responses

As the audio data is processed, the server responds with various events that we need to handle:

Warning Message: If the server detects 10 seconds of silence in the audio stream, it sends a warning message to alert the user:

if (data.object === "warning_message") {
  // Handle warning
}

Pausing and Resuming the Stream

At any point, you can pause the audio stream by sending a pause event:

socket.send(
  JSON.stringify({
    object: "pause",
  })
)

To resume the stream, simply start sending audio chunks again with the following object:

socket.send(
  JSON.stringify({
    object: "audio_chunk",
    payload: audioAsBase64String,
  })
)

Error Message: In the event of an error during the audio processing, the server sends an error message:

if (data.object === "error_message") {
  // Handle error
}

Ending the Audio Stream

When you have finished streaming the audio, it's essential to signal the end of the session. This is done by sending an end event:

socket.send(
  JSON.stringify({
    object: "end_transcript",
  })
)

After sending this event, the server processes any remaining audio chunks and eventually sends back a transcript completion event:

if (data.object === "transcript_completed") {
  // Handle transcript completion
}

Receiving the Final Summary

Once the transcript is completed, the server continues processing to generate a summary of the transcription. While waiting for the summary, the server will update the task status periodically:

if (data.object === "task_status") {
  const status = data.status
  const summary = data.summary || ""
  const errorMsg = data.error_msg || ""

  if (status === "SUCCESS") {
    // Handle successful summary
  } else if (status === "FAIL") {
    // Handle failure and error message
  } else {
    // Status is still PENDING
  }
}

Event Object: task_status Status Values: PENDING, SUCCESS, FAIL Summary Output: summary Error Message: error_msg

On success, the summary will be included in the response, allowing you to finalize the session.

Summary Output

After the transcription process is completed and the server has generated a summary, the output is structured as an array of objects. Each object contains a title and a body, where the body is an array of strings that detail the summarized information:

{
  "summary": [
    {
      "title": "Chief Complaint",
      "body": ["54-year-old patient experiencing difficulty walking."]
    },
    {
      "title": "History of Presentation",
      "body": [
        "- Patient reports trouble with walking.",
        "- No additional details on duration, timing, location, quality, severity, context, or factors affecting symptoms provided."
      ]
    }
  ]
}

Title: Provides the heading for each section of the summary (e.g., "Chief Complaint"). Body: Contains the specific details related to the title, formatted as a list of sentences or points.

Warning Message

A warning message might be sent if there is a significant pause (e.g., more than 10 seconds) during the audio stream. This warning helps in managing the transcription process:

{
  "object": "warning_message",
  "code": "1007",
  "message": "No speech detected for more than 10 seconds.",
  "is_final": false,
  "id": "5354589e-e4c3-4016-9214-b7c0c13ebe4f"
}

object: indicates the type of message.
code: is the specific warning code.
message: Describes the issue, such as "No speech detected for more than 10 seconds."
is_final: Indicates whether this is the final warning related to this issue (usually false during ongoing streaming).
id: A unique identifier for the warning event.

Error Message

An error message is sent when there is a problem with the audio stream, such as incorrect formatting. This message provides details to troubleshoot the issue:

{
  "object": "error_message",
  "code": "1000",
  "message": "audio stream is not in correct format.",
  "is_final": false,
  "id": "5354589e-e4c3-4016-9214-b7c0c13eeb4g"
}

object: indicates an error has occurred.
code: is the error code corresponding to the specific issue.
message: A description of the error, such as "audio stream is not in correct format."
is_final: Indicates if this is the final message for this error (usually false during troubleshooting).
id: A unique identifier for the error event.