Skip to main content
Question

Monitoring invite arrives 3-4 seconds after Answered, causing transcript to miss first seconds of call

  • April 13, 2026
  • 2 replies
  • 26 views

We are using `ringcentral-softphone-ts` as the monitoring softphone in a RingCentral call supervision setup.

Our flow is:

1. Receive telephony session notifications for monitored agents
2. When party status becomes `Answered`, call:
   `POST /restapi/v1.0/account/~/telephony/sessions/{sessionId}/parties/{partyId}/supervise`
3. Wait for the monitoring SIP invite in `softphone.on('invite', ...)`
4. Answer the invite
5. Start consuming `audioPacket` and stream audio to transcription

This works, but we consistently lose the first ~3-4 seconds of the call transcript.

Observed behavior:
- The first transcript segment always starts a few seconds after the actual human conversation begins
- The delay appears to happen before audio packets begin arriving, i.e. between `Answered` -> `/supervise` -> monitoring invite -> audio start

Questions:
1. Is `Answered` the earliest supported moment to call `/supervise` for call monitoring?
2. Can `/supervise` be called earlier, such as during `Setup` or `Proceeding`, to reduce or eliminate the initial audio gap?
3. Is there any recommended approach to capture audio from the very start of the call when using RingCentral supervision/monitoring?
4. Is the observed 3-4 second delay expected behavior for monitoring leg creation?
5. Are there any best practices to reduce this attach delay?

Relevant details:
- Monitoring extension is correctly configured in the call monitoring group
- SIP monitoring device is reachable and invites are answered successfully
- Once the invite is received, `audioPacket` events flow normally
- The problem is the gap before the monitoring invite/audio starts

Example worker sequence:
- telephony event: `Answered`
- call `/supervise`
- receive monitoring SIP invite
- answer invite
- start receiving `audioPacket`
- transcription begins, but several seconds into the call

Any guidance would be appreciated, especially on whether there is a supported way to avoid missing the opening seconds of the conversation.
 

2 replies

PhongVu
Community Manager
Forum|alt.badge.img
  • Community Manager
  • April 13, 2026

The delay could be expected but to be honest, I have never measured the delay.

Unfortunately, a call supervision cannot start before the call is connected. Can you do me a favor? Log the time and breakdown the delay to see where is the bottleneck.

From the telephony session event, when you receive the “Answered” event, log your server local time, and pick the event timestamp and the session timestamp.

E.g.

{

"event": "/restapi/v1.0/account/80964XXX/extension/62288YYYY/telephony/sessions",
"timestamp": "2023-02-08T16:37:43.705Z",

"body": {
      "sequence": 3,
      ...
      "eventTime": "2023-02-08T16:37:43.664Z",
      "parties": [ …

}

Then log the timestamp you subscribe to the supervision, then the timestamp your soft phone gets the SIP invite, and the timestamp the first audio package arrives.

Let’s see where is the long delay and if it can be improved.


  • Author
  • The First Step
  • April 13, 2026

@PhongVu Thank you for the comment.

 

I added detailed timing logs in an isolated repro worker on a public server outside my local machine/network.

The repro:
- subscribes only to one monitored extension
- logs:
  - webhook receive time on my server
  - webhook top-level `timestamp`
  - webhook `body.eventTime`
  - when I log the `Answered` event locally
  - when `/supervise` is requested
  - when the SIP invite is received
  - when the invite is answered
  - when the first audio packet arrives
- saves the received monitored audio to WAV

Representative results from one clean run:

Webhook delivery for the `Answered` event:
- `event.timestamp -> server receive`: about `101 ms`
- `body.eventTime -> server receive`: about `180 ms`

Supervision/media timing:
- `Answered -> /supervise request`: about `53 ms`
- `Answered -> SIP invite received`: about `419 ms`
- `Answered -> invite answered`: about `919 ms`
- `Answered -> first audio packet`: about `1162 ms`

For the other leg in the same session:
- `Answered -> /supervise request`: about `70 ms`
- `Answered -> SIP invite received`: about `768 ms`
- `Answered -> invite answered`: about `1227 ms`
- `Answered -> first audio packet`: about `1409 ms`

However, across runs I do sometimes still see the delay closer to `~2-3 seconds` before the first monitored audio packet arrives. So the `~1.1-1.4 s` case appears to be a good run, but not the worst case.

The monitored audio WAV files confirm that the beginning of the conversation is missing from the received audio itself.

So from this isolated repro:
1. Webhook delivery to my server is relatively fast, around `100-180 ms`.
2. The main delay is after the call is answered:
   - RingCentral supervision attach
   - SIP invite delivery
   - media start
3. Even in this isolated setup, the first monitored audio generally arrives around `1.1-1.4 s` after the answered event, and in some runs it can still be closer to `~2-3 s`.

This suggests the remaining delay is mostly in the supervision/media attach flow, not in my original local/full stack.

Questions:
- Is `~1.1-1.4 s` from `Answered` to first monitored audio packet expected for RingCentral call supervision?
- Is it also expected that this can sometimes drift toward `~2-5 s`?
- Is there any supported way to reduce this further?
- Is there any earlier safe point than `Answered` to request supervision, or any option to receive monitor audio sooner?

Raw logs for one run:


[repro] telephony-event {"receivedAt":"2026-04-13T21:15:34.533Z","eventTimestamp":"2026-04-13T21:15:34.432Z","bodyEventTime":"2026-04-13T21:15:34.353Z","parties":[{"status":"Answered"}]}
[repro] answered-party {"serverReceivedAt":"2026-04-13T21:15:34.533Z","answeredLoggedAt":"2026-04-13T21:15:34.533Z","eventToServerReceiveMs":101,"bodyEventToServerReceiveMs":180}
[repro] supervise-party request {"requestedAt":"2026-04-13T21:15:34.586Z","answeredToRequestMs":53}
[repro] invite {"receivedAt":"2026-04-13T21:15:34.952Z","answeredToInviteMs":419,"superviseToInviteMs":366}
[repro] invite-answered {"answeredAt":"2026-04-13T21:15:35.452Z","answeredToInviteAnsweredMs":919,"superviseToInviteAnsweredMs":866}
[repro] first-audio-packet {"receivedAt":"2026-04-13T21:15:35.695Z","answeredToFirstAudioMs":1162,"superviseToFirstAudioMs":1109,"inviteToFirstAudioMs":743}

[repro] answered-party {"answeredLoggedAt":"2026-04-13T21:15:34.868Z","eventToServerReceiveMs":160,"bodyEventToServerReceiveMs":203}
[repro] supervise-party request {"requestedAt":"2026-04-13T21:15:34.938Z","answeredToRequestMs":70}
[repro] invite {"receivedAt":"2026-04-13T21:15:35.636Z","answeredToInviteMs":768,"superviseToInviteMs":698}
[repro] invite-answered {"answeredAt":"2026-04-13T21:15:36.095Z","answeredToInviteAnsweredMs":1227,"superviseToInviteAnsweredMs":1157}
[repro] first-audio-packet {"receivedAt":"2026-04-13T21:15:36.277Z","answeredToFirstAudioMs":1409,"superviseToFirstAudioMs":1339,"inviteToFirstAudioMs":641}