Skip to content

Speech Recognition (STT)

Auto-send Avatars

For avatars that use voice input as the primary interface, you can automatically start speech recognition when the connection is established.

js
SDK.onStatus((status) => {
  if (status === 'CONNECTED_FINISH') {
    SDK.startListening();
  }
});

startListening()

Activates the microphone and starts speech recognition.

js
SDK.startListening();

endListening()

Ends speech recognition and deactivates the microphone. If recognized text exists, the avatar will respond.

js
SDK.endListening();

cancelListening()

Cancels speech recognition and deactivates the microphone. Discards recognized text and the avatar will not respond.

js
SDK.cancelListening();

STT session and speech detection are separate concepts

startListening() / endListening() control the STT session (microphone ON/OFF). USER_SPEECH_STARTED / USER_SPEECH_STOPPED detect speech segments within the session — the microphone remains active. Receiving USER_SPEECH_STOPPED does NOT end the STT session. You must call endListening() to stop it.

js
SDK.onSignal((data) => {
  switch (data.signal) {
    case 'USER_SPEECH_STARTED':
      // User started speaking (microphone was already on)
      console.log('User speech started');
      break;
    case 'USER_SPEECH_STOPPED':
      // User stopped speaking (microphone is still on)
      console.log('User speech stopped');
      break;
    case 'STT_RESULT':
      console.log('Recognition result:', data.payload.text);
      break;
  }
});

Full STT Flow