Speech-To-Text (STT)

Overview

This WebSocket API allows clients to connect to a server for real-time speech-to-text (STT) transcription. The server supports multiple STT providers and handles both text and binary messages.

Endpoint

URL: /api/v1/ws
Method: GET
Protocol: WebSocket

Environment Variables

STT_PROVIDER: Specifies the STT provider to use ("faster_whisper", "groq", "deepgram"). Default is "faster_whisper".

Connection

Upon connecting to the WebSocket endpoint, the server will log the connection and initialize the STT model.

Messages

The server handles binary messages containing audio data.

Binary Messages

Format: Binary audio data. All STT providers are expecting PCM, 16-bit, 16000 Hz, mono audio.

Events

message

Description: Triggered when a binary message (audio data) is received from the client.
Parameters:
- data: The binary message data (audio)

Handling Binary Messages:

The server will transcribe the binary data (audio) to text using the configured STT model and send the transcribed text back to the client.

close

Description: Triggered when the client closes the connection.
Parameters:
- code: The close code
- reason: The reason for closing

Server Responses

Text Responses

Format: Plain text
Content: The transcribed text from the audio data

Error Handling

If the STT provider is unsupported, the server raises a ValueError.
If the WebSocket connection is disconnected, the server logs the disconnection and closes the STT model.

Example Usage

Connecting to the WebSocket:

const socket = new WebSocket('ws://localhost:8000/api/v1/ws');

socket.onopen = () => {
  console.log('Connection opened');
};

socket.onmessage = (event) => {
  console.log('Received message:', event.data);
};

socket.onclose = (event) => {
  console.log(`Connection closed: ${event.code} ${event.reason}`);
};

socket.onerror = (error) => {
  console.error('WebSocket error:', error);
};

Sending a Binary Message:

const audioData = new Uint8Array([/* audio data */]);
socket.send(audioData);

FastAPI Application Code

from fastapi import FastAPI, WebSocket, WebSocketDisconnect
import os
import logging
from groq_stt import GroqSTT
from deepgram_stt import DeepGramSTT
from faster_whisper_stt import FasterWhisperSTT

app = FastAPI()
STT_PROVIDER = os.environ.get("STT_PROVIDER", "faster_whisper")

# Configure logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

class STTFactory:
    _instance = None

    @classmethod
    def get_instance(cls, provider: str):
        if cls._instance is None:
            if provider == "faster_whisper":
                cls._instance = FasterWhisperSTT()
            elif provider == "groq":
                cls._instance = GroqSTT()
            elif provider == "deepgram":
                cls._instance = DeepGramSTT()
            else:
                raise ValueError(f"Unsupported STT provider: {provider}")
        return cls._instance

@app.websocket("/api/v1/ws")
async def websocket_endpoint(websocket: WebSocket):
    stt_model = STTFactory.get_instance(STT_PROVIDER)
    await websocket.accept()
    try:
        logger.info("WebSocket connection established.")

        async def text_handler(text):
            await websocket.send_text(text)

        await stt_model.initialize(text_handler)

        while True:
            data = await websocket.receive_bytes()
            result = await stt_model.transcribe(data)

            if result:
                await websocket.send_text(result)
                
    except WebSocketDisconnect:
        await stt_model.close()
        logger.info("Client disconnected")

STT Interface Code

from abc import ABC, abstractmethod

class STTInterface(ABC):
    """Interface for speech-to-text services."""

    @abstractmethod
    async def initialize(self, text_handler: callable = None):
        """Initialize the STT service."""
        pass

    @abstractmethod
    async def transcribe(self, data: bytearray) -> str:
        """Transcribe audio data to text."""
        pass

    @abstractmethod
    async def close(self):
        """Close the connection to the service."""
        pass

    @property
    @abstractmethod
    def is_open(self) -> bool:
        """Check if the connection is open."""
        pass

This documentation provides an overview of the WebSocket API, including connection details, message formats, events, and example usage for the STT service.

Speech-To-Text (STT)

Overview​

Endpoint​

Environment Variables​

Connection​

Messages​

Binary Messages​

Events​

message​

close​

Server Responses​

Text Responses​

Error Handling​

Example Usage​

FastAPI Application Code​

STT Interface Code​