Speech-To-Text (STT)


This WebSocket API allows clients to connect to a server for real-time speech-to-text (STT) transcription. The server supports multiple STT providers and handles both text and binary messages.


  • URL: /api/v1/ws
  • Method: GET
  • Protocol: WebSocket

Environment Variables

  • STT_PROVIDER: Specifies the STT provider to use ("faster_whisper", "groq", "deepgram"). Default is "faster_whisper".


Upon connecting to the WebSocket endpoint, the server will log the connection and initialize the STT model.


The server handles binary messages containing audio data.

Binary Messages

  • Format: Binary audio data. All STT providers are expecting PCM, 16-bit, 16000 Hz, mono audio.



  • Description: Triggered when a binary message (audio data) is received from the client.
  • Parameters:
    • data: The binary message data (audio)

Handling Binary Messages:

  • The server will transcribe the binary data (audio) to text using the configured STT model and send the transcribed text back to the client.


  • Description: Triggered when the client closes the connection.
  • Parameters:
    • code: The close code
    • reason: The reason for closing

Server Responses

Text Responses

  • Format: Plain text
  • Content: The transcribed text from the audio data

Error Handling

  • If the STT provider is unsupported, the server raises a ValueError.
  • If the WebSocket connection is disconnected, the server logs the disconnection and closes the STT model.

Example Usage

Connecting to the WebSocket:

const socket = new WebSocket('ws://localhost:8000/api/v1/ws');

socket.onopen = () => {
console.log('Connection opened');

socket.onmessage = (event) => {
console.log('Received message:',;

socket.onclose = (event) => {
console.log(`Connection closed: ${event.code} ${event.reason}`);

socket.onerror = (error) => {
console.error('WebSocket error:', error);

Sending a Binary Message:

const audioData = new Uint8Array([/* audio data */]);

FastAPI Application Code

from fastapi import FastAPI, WebSocket, WebSocketDisconnect
import os
import logging
from groq_stt import GroqSTT
from deepgram_stt import DeepGramSTT
from faster_whisper_stt import FasterWhisperSTT

app = FastAPI()
STT_PROVIDER = os.environ.get("STT_PROVIDER", "faster_whisper")

# Configure logging
logger = logging.getLogger(__name__)

class STTFactory:
_instance = None

def get_instance(cls, provider: str):
if cls._instance is None:
if provider == "faster_whisper":
cls._instance = FasterWhisperSTT()
elif provider == "groq":
cls._instance = GroqSTT()
elif provider == "deepgram":
cls._instance = DeepGramSTT()
raise ValueError(f"Unsupported STT provider: {provider}")
return cls._instance

async def websocket_endpoint(websocket: WebSocket):
stt_model = STTFactory.get_instance(STT_PROVIDER)
await websocket.accept()
try:"WebSocket connection established.")

async def text_handler(text):
await websocket.send_text(text)

await stt_model.initialize(text_handler)

while True:
data = await websocket.receive_bytes()
result = await stt_model.transcribe(data)

if result:
await websocket.send_text(result)

except WebSocketDisconnect:
await stt_model.close()"Client disconnected")

STT Interface Code

from abc import ABC, abstractmethod

class STTInterface(ABC):
"""Interface for speech-to-text services."""

async def initialize(self, text_handler: callable = None):
"""Initialize the STT service."""

async def transcribe(self, data: bytearray) -> str:
"""Transcribe audio data to text."""

async def close(self):
"""Close the connection to the service."""

def is_open(self) -> bool:
"""Check if the connection is open."""

This documentation provides an overview of the WebSocket API, including connection details, message formats, events, and example usage for the STT service.