Skip to content

Bug: Twilio Extension Fails to interrupt playback if audio has already been fully sent. #1173

@tmmvn

Description

@tmmvn

Please read this first

  • Have you read the docs? Agents SDK docs
  • Have you searched for related issues? Others may have faced similar issues.

Describe the bug

During a longer response (think e.g. a legal disclaimer with +10s of playback), it seems the session.interrupt() (which then calls the transport.interrupt) does not actually interrupt Twilio playback.

It seems that the TwilioRealtimeTransport only implements _interrupt. However, the super class interrupt silently aborts if it has already submitted all the audio, never calling the _interrupt.

The variables seem to be reset here, which in turn gets called on response.output_audio.done.

However, Twilio uses marks to indicate when playback has finished, so if the generation + submission to Twilio takes 1s, but actual response is 10s in length, you only have a 1s window to interrupt correctly clearing the response stream.

Debug information

  • Agents SDK version: 0.8.3
  • Runtime environment (e.g. Node.js 22.16.0)

Repro steps

Give the agent an instruction like

Read the following disclaimer:
All the information on this website is published in good faith and for general information purpose only. Website Name does not make any warranties about the completeness, reliability and accuracy of this information. Any action you take upon the information you find on this website (Website.com), is strictly at your own risk. will not be liable for any losses and/or damages in connection with the use of our website.

Create your Twilio sessions, and then register an event listener like so:

this.session.on('transport_event', async (transportEvent) => {
  if (transportEvent.type === 'input_audio_buffer.speech_started') {
    try {
            this.session.interrupt();
     } catch (interruptErr) {
            console.warn(`interrupt() failed (race condition):`, interruptErr?.message || interruptErr);
     }
  }
}

Then after connecting start with

this.session.transport.sendEvent({ type: 'response.create' });

so the agent will talk first.

Let the agent talk for a bit, then interrupt the agent. Notice that the agent keeps on talking if the interruption does not happen during generation or during transfer of audio to Twilio.

Now, add this code after (or before) the interrupt code above:

if (this.twilioWebSocket) {
  this.twilioWebSocket.send(JSON.stringify({ event: 'clear', streamSid: this.payload?.streamSid }));
}

Notice the playback now gets interrupted.

Expected behavior

The expectation is that sending an interrupt to the Twilio Realtime Session will interrupt audio playback.

Likely fix will involve adding the top level interrupt method to TwilioRealtimeTransport which would do the clear like so:

interrupt(cancelOngoingResponse: boolean = true) {
    // ALWAYS clear the Twilio buffer immediately when interrupted,
    // even if OpenAI has already finished generating the response.
    this.#twilioWebSocket.send(
      JSON.stringify({
        event: 'clear',
        streamSid: this.#streamSid,
      }),
    );
    super.interrupt(cancelOngoingResponse);
  }

and removing the clear from the _interrupt (though think there is no harm clearing twice).

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions